{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 04 数据的容器 ———— Pandas\n",
    "\n",
    "参考视频\n",
    "https://turingplanet.org/pandas_intro_tutorial/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.5 DataFrame，数据表"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上述数据类型都是在底层工作的，我们实际要处理的数据很想excel表格。\n",
    "- 二维表，有行有列，行有编号1,2,3...，列有名称\n",
    "- 每个单元格（cell）里存放的数据可能是数字、小数、科学计数、文本、布尔数据（是或否）、时间（年月日、时分秒）\n",
    "- 需要整列运算数据\n",
    "- 需要多行筛选、排序\n",
    "- 需要把一个数据表拆分成多个\n",
    "- 需要把多个数据表合并成一个\n",
    "\n",
    "DataFrame应运而生，它包含了excel所有的数据处理功能，甚至更多。\n",
    "DataFrame不是python内置的，需要先导入包再使用。\n",
    "\n",
    "### 2.5.1 Dataframe数据加载"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Average Rt(min)</th>\n",
       "      <th>Average Mz</th>\n",
       "      <th>Metabolite name</th>\n",
       "      <th>Adduct type</th>\n",
       "      <th>Formula</th>\n",
       "      <th>Ontology</th>\n",
       "      <th>INCHIKEY</th>\n",
       "      <th>SMILES</th>\n",
       "      <th>MS/MS matched</th>\n",
       "      <th>Matched peaks count</th>\n",
       "      <th>Matched peaks percentage</th>\n",
       "      <th>Total score</th>\n",
       "      <th>G1_2</th>\n",
       "      <th>G1_3</th>\n",
       "      <th>G2_1</th>\n",
       "      <th>G2_2</th>\n",
       "      <th>G2_3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.616</td>\n",
       "      <td>72.08069</td>\n",
       "      <td>C12244;C12244;CFMID</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C4H9N</td>\n",
       "      <td>NaN</td>\n",
       "      <td>ASVKKRLMJCWVQF-UHFFFAOYSA-N</td>\n",
       "      <td>C=CCCN</td>\n",
       "      <td>True</td>\n",
       "      <td>7</td>\n",
       "      <td>0.57</td>\n",
       "      <td>1.47</td>\n",
       "      <td>2.153791e+08</td>\n",
       "      <td>3.298944e+08</td>\n",
       "      <td>3.024013e+08</td>\n",
       "      <td>2.554517e+08</td>\n",
       "      <td>2.960232e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.458</td>\n",
       "      <td>72.08071</td>\n",
       "      <td>C12244;C12244;CFMID</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C4H9N</td>\n",
       "      <td>NaN</td>\n",
       "      <td>ASVKKRLMJCWVQF-UHFFFAOYSA-N</td>\n",
       "      <td>C=CCCN</td>\n",
       "      <td>True</td>\n",
       "      <td>7</td>\n",
       "      <td>0.57</td>\n",
       "      <td>1.47</td>\n",
       "      <td>5.680004e+07</td>\n",
       "      <td>4.699128e+07</td>\n",
       "      <td>5.367473e+07</td>\n",
       "      <td>8.490319e+07</td>\n",
       "      <td>4.850104e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.674</td>\n",
       "      <td>74.05996</td>\n",
       "      <td>C00263;L-Homoserine;inhouse</td>\n",
       "      <td>[M-UNK]+</td>\n",
       "      <td>C4H9NO3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>UKAUYVFTDYCKQA-VKHMYHEASA-N</td>\n",
       "      <td>N[C@H](C(O)=O)CCO</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>0.33</td>\n",
       "      <td>1.68</td>\n",
       "      <td>6.999451e+07</td>\n",
       "      <td>9.600511e+07</td>\n",
       "      <td>6.023802e+07</td>\n",
       "      <td>3.765911e+07</td>\n",
       "      <td>8.988370e+07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Average Rt(min)  Average Mz              Metabolite name Adduct type  \\\n",
       "0            1.616    72.08069          C12244;C12244;CFMID      [M+H]+   \n",
       "1            1.458    72.08071          C12244;C12244;CFMID      [M+H]+   \n",
       "2            1.674    74.05996  C00263;L-Homoserine;inhouse    [M-UNK]+   \n",
       "\n",
       "   Formula Ontology                     INCHIKEY             SMILES  \\\n",
       "0    C4H9N      NaN  ASVKKRLMJCWVQF-UHFFFAOYSA-N             C=CCCN   \n",
       "1    C4H9N      NaN  ASVKKRLMJCWVQF-UHFFFAOYSA-N             C=CCCN   \n",
       "2  C4H9NO3      NaN  UKAUYVFTDYCKQA-VKHMYHEASA-N  N[C@H](C(O)=O)CCO   \n",
       "\n",
       "   MS/MS matched  Matched peaks count  Matched peaks percentage  Total score  \\\n",
       "0           True                    7                      0.57         1.47   \n",
       "1           True                    7                      0.57         1.47   \n",
       "2           True                    3                      0.33         1.68   \n",
       "\n",
       "           G1_2          G1_3          G2_1          G2_2          G2_3  \n",
       "0  2.153791e+08  3.298944e+08  3.024013e+08  2.554517e+08  2.960232e+08  \n",
       "1  5.680004e+07  4.699128e+07  5.367473e+07  8.490319e+07  4.850104e+07  \n",
       "2  6.999451e+07  9.600511e+07  6.023802e+07  3.765911e+07  8.988370e+07  "
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv('data/LC-MS.csv')\n",
    "# 返回的df是DataFrame类型，也可以从其它数据库、文本文件、网页文件和剪贴板中导入数据\n",
    "# 也可以把数据导出到各种数据文件中\n",
    "df.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Average Rt(min)</th>\n",
       "      <th>Average Mz</th>\n",
       "      <th>Metabolite name</th>\n",
       "      <th>Adduct type</th>\n",
       "      <th>Formula</th>\n",
       "      <th>Ontology</th>\n",
       "      <th>INCHIKEY</th>\n",
       "      <th>SMILES</th>\n",
       "      <th>MS/MS matched</th>\n",
       "      <th>Matched peaks count</th>\n",
       "      <th>Matched peaks percentage</th>\n",
       "      <th>Total score</th>\n",
       "      <th>G1_2</th>\n",
       "      <th>G1_3</th>\n",
       "      <th>G2_1</th>\n",
       "      <th>G2_2</th>\n",
       "      <th>G2_3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>259</th>\n",
       "      <td>18.717</td>\n",
       "      <td>680.47894</td>\n",
       "      <td>C03044;(2R,3R)-Butane-2,3-diol;inhouse</td>\n",
       "      <td>[M+UNK]+</td>\n",
       "      <td>C4H10O2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OWBTYPJTUOEWEK-QWWZWVQMSA-N</td>\n",
       "      <td>C[C@@H](O)[C@H](O)C</td>\n",
       "      <td>True</td>\n",
       "      <td>12</td>\n",
       "      <td>0.33</td>\n",
       "      <td>1.49</td>\n",
       "      <td>59713444.23</td>\n",
       "      <td>44600690.01</td>\n",
       "      <td>2.035576e+07</td>\n",
       "      <td>66484532.02</td>\n",
       "      <td>38173392.41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>260</th>\n",
       "      <td>18.269</td>\n",
       "      <td>680.47894</td>\n",
       "      <td>C02137;Phenylglyoxylic acid;inhouse</td>\n",
       "      <td>[M+UNK]+</td>\n",
       "      <td>C8H6O3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>FAQJJMHZNSSFSM-UHFFFAOYSA-N</td>\n",
       "      <td>O=C(O)C(C1=CC=CC=C1)=O</td>\n",
       "      <td>True</td>\n",
       "      <td>11</td>\n",
       "      <td>0.27</td>\n",
       "      <td>1.51</td>\n",
       "      <td>84349130.07</td>\n",
       "      <td>78555602.89</td>\n",
       "      <td>2.734778e+07</td>\n",
       "      <td>91643089.02</td>\n",
       "      <td>69889740.61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>261</th>\n",
       "      <td>19.400</td>\n",
       "      <td>680.47894</td>\n",
       "      <td>C16971;Hirsuteine;inhouse</td>\n",
       "      <td>[M+UNK]+</td>\n",
       "      <td>C22H26N2O3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TZUGIFAYWNNSAO-AZQGJTAVSA-N</td>\n",
       "      <td>[H][C@]1([C@](/C(C(OC)=O)=C\\OC)([H])C[C@]2([H]...</td>\n",
       "      <td>True</td>\n",
       "      <td>13</td>\n",
       "      <td>0.31</td>\n",
       "      <td>1.49</td>\n",
       "      <td>61973019.63</td>\n",
       "      <td>66551939.28</td>\n",
       "      <td>3.487791e+07</td>\n",
       "      <td>52158093.15</td>\n",
       "      <td>52135447.81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>262</th>\n",
       "      <td>17.281</td>\n",
       "      <td>680.47931</td>\n",
       "      <td>C02137;Phenylglyoxylic acid;inhouse</td>\n",
       "      <td>[M+UNK]+</td>\n",
       "      <td>C8H6O3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>FAQJJMHZNSSFSM-UHFFFAOYSA-N</td>\n",
       "      <td>O=C(O)C(C1=CC=CC=C1)=O</td>\n",
       "      <td>True</td>\n",
       "      <td>11</td>\n",
       "      <td>0.36</td>\n",
       "      <td>1.48</td>\n",
       "      <td>42503486.53</td>\n",
       "      <td>43969125.55</td>\n",
       "      <td>6.269818e+07</td>\n",
       "      <td>48961702.46</td>\n",
       "      <td>14322847.59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>263</th>\n",
       "      <td>17.393</td>\n",
       "      <td>708.51068</td>\n",
       "      <td>C06426;Gamma-Linolenic acid;inhouse</td>\n",
       "      <td>[M+UNK]+</td>\n",
       "      <td>C18H30O2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>VZCCETWTMQHEPK-QNEBEIHSSA-N</td>\n",
       "      <td>CCCCC/C=C\\C/C=C\\C/C=C\\CCCCC(O)=O</td>\n",
       "      <td>True</td>\n",
       "      <td>12</td>\n",
       "      <td>0.25</td>\n",
       "      <td>1.42</td>\n",
       "      <td>84362176.03</td>\n",
       "      <td>83312543.36</td>\n",
       "      <td>1.359576e+08</td>\n",
       "      <td>98451097.33</td>\n",
       "      <td>18628403.98</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Average Rt(min)  Average Mz                         Metabolite name  \\\n",
       "259           18.717   680.47894  C03044;(2R,3R)-Butane-2,3-diol;inhouse   \n",
       "260           18.269   680.47894     C02137;Phenylglyoxylic acid;inhouse   \n",
       "261           19.400   680.47894               C16971;Hirsuteine;inhouse   \n",
       "262           17.281   680.47931     C02137;Phenylglyoxylic acid;inhouse   \n",
       "263           17.393   708.51068     C06426;Gamma-Linolenic acid;inhouse   \n",
       "\n",
       "    Adduct type     Formula Ontology                     INCHIKEY  \\\n",
       "259    [M+UNK]+     C4H10O2      NaN  OWBTYPJTUOEWEK-QWWZWVQMSA-N   \n",
       "260    [M+UNK]+      C8H6O3      NaN  FAQJJMHZNSSFSM-UHFFFAOYSA-N   \n",
       "261    [M+UNK]+  C22H26N2O3      NaN  TZUGIFAYWNNSAO-AZQGJTAVSA-N   \n",
       "262    [M+UNK]+      C8H6O3      NaN  FAQJJMHZNSSFSM-UHFFFAOYSA-N   \n",
       "263    [M+UNK]+    C18H30O2      NaN  VZCCETWTMQHEPK-QNEBEIHSSA-N   \n",
       "\n",
       "                                                SMILES  MS/MS matched  \\\n",
       "259                                C[C@@H](O)[C@H](O)C           True   \n",
       "260                             O=C(O)C(C1=CC=CC=C1)=O           True   \n",
       "261  [H][C@]1([C@](/C(C(OC)=O)=C\\OC)([H])C[C@]2([H]...           True   \n",
       "262                             O=C(O)C(C1=CC=CC=C1)=O           True   \n",
       "263                   CCCCC/C=C\\C/C=C\\C/C=C\\CCCCC(O)=O           True   \n",
       "\n",
       "     Matched peaks count  Matched peaks percentage  Total score         G1_2  \\\n",
       "259                   12                      0.33         1.49  59713444.23   \n",
       "260                   11                      0.27         1.51  84349130.07   \n",
       "261                   13                      0.31         1.49  61973019.63   \n",
       "262                   11                      0.36         1.48  42503486.53   \n",
       "263                   12                      0.25         1.42  84362176.03   \n",
       "\n",
       "            G1_3          G2_1         G2_2         G2_3  \n",
       "259  44600690.01  2.035576e+07  66484532.02  38173392.41  \n",
       "260  78555602.89  2.734778e+07  91643089.02  69889740.61  \n",
       "261  66551939.28  3.487791e+07  52158093.15  52135447.81  \n",
       "262  43969125.55  6.269818e+07  48961702.46  14322847.59  \n",
       "263  83312543.36  1.359576e+08  98451097.33  18628403.98  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(264, 17)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=264, step=1)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Average Rt(min)', 'Average Mz', 'Metabolite name', 'Adduct type',\n",
       "       'Formula', 'Ontology', 'INCHIKEY', 'SMILES', 'MS/MS matched',\n",
       "       'Matched peaks count', 'Matched peaks percentage', 'Total score',\n",
       "       'G1_2', 'G1_3', 'G2_1', 'G2_2', 'G2_3'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.5.2 列数据、行数据引用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Average Rt(min)</th>\n",
       "      <th>Average Mz</th>\n",
       "      <th>Matched peaks count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.616</td>\n",
       "      <td>72.08069</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.458</td>\n",
       "      <td>72.08071</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.674</td>\n",
       "      <td>74.05996</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.250</td>\n",
       "      <td>76.03921</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.627</td>\n",
       "      <td>79.02108</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259</th>\n",
       "      <td>18.717</td>\n",
       "      <td>680.47894</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>260</th>\n",
       "      <td>18.269</td>\n",
       "      <td>680.47894</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>261</th>\n",
       "      <td>19.400</td>\n",
       "      <td>680.47894</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>262</th>\n",
       "      <td>17.281</td>\n",
       "      <td>680.47931</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>263</th>\n",
       "      <td>17.393</td>\n",
       "      <td>708.51068</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>264 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Average Rt(min)  Average Mz  Matched peaks count\n",
       "0              1.616    72.08069                    7\n",
       "1              1.458    72.08071                    7\n",
       "2              1.674    74.05996                    3\n",
       "3              1.250    76.03921                    5\n",
       "4              1.627    79.02108                   10\n",
       "..               ...         ...                  ...\n",
       "259           18.717   680.47894                   12\n",
       "260           18.269   680.47894                   11\n",
       "261           19.400   680.47894                   13\n",
       "262           17.281   680.47931                   11\n",
       "263           17.393   708.51068                   12\n",
       "\n",
       "[264 rows x 3 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 获取整列\n",
    "df[['Average Rt(min)', 'Average Mz','Matched peaks count']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Average Rt(min)                                   1.257\n",
       "Average Mz                                     90.05482\n",
       "Metabolite name                     C00041;C00041;CFMID\n",
       "Adduct type                                      [M+H]+\n",
       "Formula                                         C3H7NO2\n",
       "Ontology                                            NaN\n",
       "INCHIKEY                    QNAYBMKLOCPYGJ-REOHCLBHSA-N\n",
       "SMILES                                  C[C@H](N)C(=O)O\n",
       "MS/MS matched                                      True\n",
       "Matched peaks count                                   3\n",
       "Matched peaks percentage                            1.0\n",
       "Total score                                        1.77\n",
       "G1_2                                        31850500.94\n",
       "G1_3                                        64628139.75\n",
       "G2_1                                        43734793.29\n",
       "G2_2                                        46828251.71\n",
       "G2_3                                        37470441.34\n",
       "Name: 9, dtype: object"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 获取一行\n",
    "df.iloc[9]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Average Rt(min)</th>\n",
       "      <th>Average Mz</th>\n",
       "      <th>Metabolite name</th>\n",
       "      <th>Adduct type</th>\n",
       "      <th>Formula</th>\n",
       "      <th>Ontology</th>\n",
       "      <th>INCHIKEY</th>\n",
       "      <th>SMILES</th>\n",
       "      <th>MS/MS matched</th>\n",
       "      <th>Matched peaks count</th>\n",
       "      <th>Matched peaks percentage</th>\n",
       "      <th>Total score</th>\n",
       "      <th>G1_2</th>\n",
       "      <th>G1_3</th>\n",
       "      <th>G2_1</th>\n",
       "      <th>G2_2</th>\n",
       "      <th>G2_3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.616</td>\n",
       "      <td>72.08069</td>\n",
       "      <td>C12244;C12244;CFMID</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C4H9N</td>\n",
       "      <td>NaN</td>\n",
       "      <td>ASVKKRLMJCWVQF-UHFFFAOYSA-N</td>\n",
       "      <td>C=CCCN</td>\n",
       "      <td>True</td>\n",
       "      <td>7</td>\n",
       "      <td>0.57</td>\n",
       "      <td>1.47</td>\n",
       "      <td>2.153791e+08</td>\n",
       "      <td>3.298944e+08</td>\n",
       "      <td>3.024013e+08</td>\n",
       "      <td>2.554517e+08</td>\n",
       "      <td>2.960232e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.458</td>\n",
       "      <td>72.08071</td>\n",
       "      <td>C12244;C12244;CFMID</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C4H9N</td>\n",
       "      <td>NaN</td>\n",
       "      <td>ASVKKRLMJCWVQF-UHFFFAOYSA-N</td>\n",
       "      <td>C=CCCN</td>\n",
       "      <td>True</td>\n",
       "      <td>7</td>\n",
       "      <td>0.57</td>\n",
       "      <td>1.47</td>\n",
       "      <td>5.680004e+07</td>\n",
       "      <td>4.699128e+07</td>\n",
       "      <td>5.367473e+07</td>\n",
       "      <td>8.490319e+07</td>\n",
       "      <td>4.850104e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.674</td>\n",
       "      <td>74.05996</td>\n",
       "      <td>C00263;L-Homoserine;inhouse</td>\n",
       "      <td>[M-UNK]+</td>\n",
       "      <td>C4H9NO3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>UKAUYVFTDYCKQA-VKHMYHEASA-N</td>\n",
       "      <td>N[C@H](C(O)=O)CCO</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>0.33</td>\n",
       "      <td>1.68</td>\n",
       "      <td>6.999451e+07</td>\n",
       "      <td>9.600511e+07</td>\n",
       "      <td>6.023802e+07</td>\n",
       "      <td>3.765911e+07</td>\n",
       "      <td>8.988370e+07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Average Rt(min)  Average Mz              Metabolite name Adduct type  \\\n",
       "0            1.616    72.08069          C12244;C12244;CFMID      [M+H]+   \n",
       "1            1.458    72.08071          C12244;C12244;CFMID      [M+H]+   \n",
       "2            1.674    74.05996  C00263;L-Homoserine;inhouse    [M-UNK]+   \n",
       "\n",
       "   Formula Ontology                     INCHIKEY             SMILES  \\\n",
       "0    C4H9N      NaN  ASVKKRLMJCWVQF-UHFFFAOYSA-N             C=CCCN   \n",
       "1    C4H9N      NaN  ASVKKRLMJCWVQF-UHFFFAOYSA-N             C=CCCN   \n",
       "2  C4H9NO3      NaN  UKAUYVFTDYCKQA-VKHMYHEASA-N  N[C@H](C(O)=O)CCO   \n",
       "\n",
       "   MS/MS matched  Matched peaks count  Matched peaks percentage  Total score  \\\n",
       "0           True                    7                      0.57         1.47   \n",
       "1           True                    7                      0.57         1.47   \n",
       "2           True                    3                      0.33         1.68   \n",
       "\n",
       "           G1_2          G1_3          G2_1          G2_2          G2_3  \n",
       "0  2.153791e+08  3.298944e+08  3.024013e+08  2.554517e+08  2.960232e+08  \n",
       "1  5.680004e+07  4.699128e+07  5.367473e+07  8.490319e+07  4.850104e+07  \n",
       "2  6.999451e+07  9.600511e+07  6.023802e+07  3.765911e+07  8.988370e+07  "
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 获取多行（数据切片）\n",
    "#df中的行索引存放在了loc属性里\n",
    "\n",
    "df.loc[:2]\n",
    "\n",
    "#获取行编号0-2的行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Average Rt(min)</th>\n",
       "      <th>Average Mz</th>\n",
       "      <th>Metabolite name</th>\n",
       "      <th>Adduct type</th>\n",
       "      <th>Formula</th>\n",
       "      <th>Ontology</th>\n",
       "      <th>INCHIKEY</th>\n",
       "      <th>SMILES</th>\n",
       "      <th>MS/MS matched</th>\n",
       "      <th>Matched peaks count</th>\n",
       "      <th>Matched peaks percentage</th>\n",
       "      <th>Total score</th>\n",
       "      <th>G1_2</th>\n",
       "      <th>G1_3</th>\n",
       "      <th>G2_1</th>\n",
       "      <th>G2_2</th>\n",
       "      <th>G2_3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.458</td>\n",
       "      <td>72.08071</td>\n",
       "      <td>C12244;C12244;CFMID</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C4H9N</td>\n",
       "      <td>NaN</td>\n",
       "      <td>ASVKKRLMJCWVQF-UHFFFAOYSA-N</td>\n",
       "      <td>C=CCCN</td>\n",
       "      <td>True</td>\n",
       "      <td>7</td>\n",
       "      <td>0.57</td>\n",
       "      <td>1.47</td>\n",
       "      <td>5.680004e+07</td>\n",
       "      <td>4.699128e+07</td>\n",
       "      <td>5.367473e+07</td>\n",
       "      <td>8.490319e+07</td>\n",
       "      <td>4.850104e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.250</td>\n",
       "      <td>76.03921</td>\n",
       "      <td>C00037;Glycine; AIF; CE10; MS2Dec;MoNA</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C2H5NO2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DHMQDGOQFOQNFH-UHFFFAOYSA-N</td>\n",
       "      <td>O=C(O)CN</td>\n",
       "      <td>True</td>\n",
       "      <td>5</td>\n",
       "      <td>0.40</td>\n",
       "      <td>1.58</td>\n",
       "      <td>4.955085e+07</td>\n",
       "      <td>8.532785e+07</td>\n",
       "      <td>7.484436e+07</td>\n",
       "      <td>7.181890e+07</td>\n",
       "      <td>5.938876e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>1.912</td>\n",
       "      <td>104.10680</td>\n",
       "      <td>C00114;CHOLINE;MoNA</td>\n",
       "      <td>[M]+</td>\n",
       "      <td>C5H14NO</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OEYIOHPDSNJKLS-UHFFFAOYSA-N</td>\n",
       "      <td>OCC[N+](C)(C)C</td>\n",
       "      <td>True</td>\n",
       "      <td>9</td>\n",
       "      <td>0.33</td>\n",
       "      <td>1.67</td>\n",
       "      <td>1.179760e+09</td>\n",
       "      <td>1.481314e+09</td>\n",
       "      <td>1.277634e+09</td>\n",
       "      <td>1.698198e+09</td>\n",
       "      <td>1.291239e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>4.574</td>\n",
       "      <td>83.06027</td>\n",
       "      <td>C19262;C19262;CFMID</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C4H6N2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>XLSZMDLNRCVEIJ-UHFFFAOYSA-N</td>\n",
       "      <td>Cc1cnc[nH]1</td>\n",
       "      <td>True</td>\n",
       "      <td>5</td>\n",
       "      <td>0.60</td>\n",
       "      <td>1.42</td>\n",
       "      <td>1.410845e+07</td>\n",
       "      <td>8.694241e+07</td>\n",
       "      <td>1.247902e+08</td>\n",
       "      <td>3.864554e+07</td>\n",
       "      <td>3.952615e+07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Average Rt(min)  Average Mz                         Metabolite name  \\\n",
       "1             1.458    72.08071                     C12244;C12244;CFMID   \n",
       "3             1.250    76.03921  C00037;Glycine; AIF; CE10; MS2Dec;MoNA   \n",
       "16            1.912   104.10680                     C00114;CHOLINE;MoNA   \n",
       "7             4.574    83.06027                     C19262;C19262;CFMID   \n",
       "\n",
       "   Adduct type  Formula Ontology                     INCHIKEY          SMILES  \\\n",
       "1       [M+H]+    C4H9N      NaN  ASVKKRLMJCWVQF-UHFFFAOYSA-N          C=CCCN   \n",
       "3       [M+H]+  C2H5NO2      NaN  DHMQDGOQFOQNFH-UHFFFAOYSA-N        O=C(O)CN   \n",
       "16        [M]+  C5H14NO      NaN  OEYIOHPDSNJKLS-UHFFFAOYSA-N  OCC[N+](C)(C)C   \n",
       "7       [M+H]+   C4H6N2      NaN  XLSZMDLNRCVEIJ-UHFFFAOYSA-N     Cc1cnc[nH]1   \n",
       "\n",
       "    MS/MS matched  Matched peaks count  Matched peaks percentage  Total score  \\\n",
       "1            True                    7                      0.57         1.47   \n",
       "3            True                    5                      0.40         1.58   \n",
       "16           True                    9                      0.33         1.67   \n",
       "7            True                    5                      0.60         1.42   \n",
       "\n",
       "            G1_2          G1_3          G2_1          G2_2          G2_3  \n",
       "1   5.680004e+07  4.699128e+07  5.367473e+07  8.490319e+07  4.850104e+07  \n",
       "3   4.955085e+07  8.532785e+07  7.484436e+07  7.181890e+07  5.938876e+07  \n",
       "16  1.179760e+09  1.481314e+09  1.277634e+09  1.698198e+09  1.291239e+09  \n",
       "7   1.410845e+07  8.694241e+07  1.247902e+08  3.864554e+07  3.952615e+07  "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 自定义多行,花式索引\n",
    "df.loc[[1,3,16,7]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Average Rt(min)</th>\n",
       "      <th>Formula</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.458</td>\n",
       "      <td>C4H9N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.674</td>\n",
       "      <td>C4H9NO3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.250</td>\n",
       "      <td>C2H5NO2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>1.912</td>\n",
       "      <td>C5H14NO</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Average Rt(min)  Formula\n",
       "1             1.458    C4H9N\n",
       "2             1.674  C4H9NO3\n",
       "3             1.250  C2H5NO2\n",
       "16            1.912  C5H14NO"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## 取行，也要取列\n",
    "\n",
    "df.loc[[1,2,3,16], ['Average Rt(min)', 'Formula']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>smiles</th>\n",
       "      <th>name</th>\n",
       "      <th>MolWt</th>\n",
       "      <th>TPSA</th>\n",
       "      <th>nRotB</th>\n",
       "      <th>HBD</th>\n",
       "      <th>HBA</th>\n",
       "      <th>LogP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>O=Cc1ccccc1</td>\n",
       "      <td>benzaldehyde|Benzaldehyd</td>\n",
       "      <td>106.041865</td>\n",
       "      <td>17.07</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1.4991</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>O=c1c(O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12</td>\n",
       "      <td>quercetol|2-(3,4-dihydroxyphenyl)-3,5,7-trihyd...</td>\n",
       "      <td>302.042653</td>\n",
       "      <td>131.36</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>1.9880</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>O=c1cc(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12</td>\n",
       "      <td>2-(3,4-Dihydroxy-phenyl)-5,7-dihydroxy-chromen...</td>\n",
       "      <td>286.047738</td>\n",
       "      <td>111.13</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>2.2824</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>O=c1cc(-c2ccc(O)cc2)oc2cc(O)cc(O)c12</td>\n",
       "      <td>5,7-dihydroxy-2-(4-hydroxyphenyl)-4H-1-benzopy...</td>\n",
       "      <td>270.052823</td>\n",
       "      <td>90.90</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>2.5768</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                       smiles  \\\n",
       "0                                 O=Cc1ccccc1   \n",
       "1  O=c1c(O)c(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12   \n",
       "2     O=c1cc(-c2ccc(O)c(O)c2)oc2cc(O)cc(O)c12   \n",
       "3        O=c1cc(-c2ccc(O)cc2)oc2cc(O)cc(O)c12   \n",
       "\n",
       "                                                name       MolWt    TPSA  \\\n",
       "0                           benzaldehyde|Benzaldehyd  106.041865   17.07   \n",
       "1  quercetol|2-(3,4-dihydroxyphenyl)-3,5,7-trihyd...  302.042653  131.36   \n",
       "2  2-(3,4-Dihydroxy-phenyl)-5,7-dihydroxy-chromen...  286.047738  111.13   \n",
       "3  5,7-dihydroxy-2-(4-hydroxyphenyl)-4H-1-benzopy...  270.052823   90.90   \n",
       "\n",
       "   nRotB  HBD  HBA    LogP  \n",
       "0      1    0    1  1.4991  \n",
       "1      1    5    7  1.9880  \n",
       "2      1    4    6  2.2824  \n",
       "3      1    3    5  2.5768  "
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 获得头几行\n",
    "df.head(4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>smiles</th>\n",
       "      <th>name</th>\n",
       "      <th>MolWt</th>\n",
       "      <th>TPSA</th>\n",
       "      <th>nRotB</th>\n",
       "      <th>HBD</th>\n",
       "      <th>HBA</th>\n",
       "      <th>LogP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>O=C[C@H](O)[C@@H](O)[C@H](O)CO</td>\n",
       "      <td>D-xylose</td>\n",
       "      <td>150.052823</td>\n",
       "      <td>97.99</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>-2.7397</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>O=c1c(O)c(-c2ccc(O)cc2)oc2cc(O)cc(O)c12</td>\n",
       "      <td>kaempferol|3,5,7-trihydroxy-2-(4-hydroxyphenyl...</td>\n",
       "      <td>286.047738</td>\n",
       "      <td>111.13</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>2.2824</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>89</th>\n",
       "      <td>O=C(/C=C/c1ccc(O)c(O)c1)O[C@@H]1C[C@](O)(C(=O)...</td>\n",
       "      <td>3-caffeoylquinic acid|chlorogenic acid</td>\n",
       "      <td>354.095082</td>\n",
       "      <td>164.75</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>8</td>\n",
       "      <td>-0.6459</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>O=C(O)c1cc(O)c(O)c(O)c1</td>\n",
       "      <td>3,4,5-trihydroxybenzoic acid|gallic acid|3,4,5...</td>\n",
       "      <td>170.021523</td>\n",
       "      <td>97.99</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>0.5016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>91</th>\n",
       "      <td>O=C[C@H](O)[C@@H](O)[C@@H](O)[C@H](O)CO</td>\n",
       "      <td>D-Galactose</td>\n",
       "      <td>180.063388</td>\n",
       "      <td>118.22</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>-3.3788</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               smiles  \\\n",
       "87                     O=C[C@H](O)[C@@H](O)[C@H](O)CO   \n",
       "88            O=c1c(O)c(-c2ccc(O)cc2)oc2cc(O)cc(O)c12   \n",
       "89  O=C(/C=C/c1ccc(O)c(O)c1)O[C@@H]1C[C@](O)(C(=O)...   \n",
       "90                            O=C(O)c1cc(O)c(O)c(O)c1   \n",
       "91            O=C[C@H](O)[C@@H](O)[C@@H](O)[C@H](O)CO   \n",
       "\n",
       "                                                 name       MolWt    TPSA  \\\n",
       "87                                           D-xylose  150.052823   97.99   \n",
       "88  kaempferol|3,5,7-trihydroxy-2-(4-hydroxyphenyl...  286.047738  111.13   \n",
       "89             3-caffeoylquinic acid|chlorogenic acid  354.095082  164.75   \n",
       "90  3,4,5-trihydroxybenzoic acid|gallic acid|3,4,5...  170.021523   97.99   \n",
       "91                                        D-Galactose  180.063388  118.22   \n",
       "\n",
       "    nRotB  HBD  HBA    LogP  \n",
       "87      4    4    5 -2.7397  \n",
       "88      1    4    6  2.2824  \n",
       "89      4    6    8 -0.6459  \n",
       "90      1    4    4  0.5016  \n",
       "91      5    5    6 -3.3788  "
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 获得尾几行\n",
    "df.tail(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "numpy.ndarray"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = df.loc[[1,2,3,7], ['G2_1', 'G2_2', 'G2_3']].values\n",
    "type(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>G2_1</th>\n",
       "      <th>G2_2</th>\n",
       "      <th>G2_3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5.367473e+07</td>\n",
       "      <td>84903186.23</td>\n",
       "      <td>48501044.77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6.023802e+07</td>\n",
       "      <td>37659110.47</td>\n",
       "      <td>89883700.70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>7.484436e+07</td>\n",
       "      <td>71818900.24</td>\n",
       "      <td>59388762.24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1.247902e+08</td>\n",
       "      <td>38645540.00</td>\n",
       "      <td>39526152.22</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           G2_1         G2_2         G2_3\n",
       "1  5.367473e+07  84903186.23  48501044.77\n",
       "2  6.023802e+07  37659110.47  89883700.70\n",
       "3  7.484436e+07  71818900.24  59388762.24\n",
       "7  1.247902e+08  38645540.00  39526152.22"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[[1,2,3,7], ['G2_1', 'G2_2', 'G2_3']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[5.36747260e+07, 8.49031862e+07, 4.85010448e+07],\n",
       "       [6.02380160e+07, 3.76591105e+07, 8.98837007e+07],\n",
       "       [7.48443597e+07, 7.18189002e+07, 5.93887622e+07],\n",
       "       [1.24790224e+08, 3.86455400e+07, 3.95261522e+07]])"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3.2 DataFrame 的增改\n",
    "#### 整列改变"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Average Rt(min)</th>\n",
       "      <th>Average Mz</th>\n",
       "      <th>Metabolite name</th>\n",
       "      <th>Adduct type</th>\n",
       "      <th>Formula</th>\n",
       "      <th>Ontology</th>\n",
       "      <th>INCHIKEY</th>\n",
       "      <th>SMILES</th>\n",
       "      <th>MS/MS matched</th>\n",
       "      <th>Matched peaks count</th>\n",
       "      <th>Matched peaks percentage</th>\n",
       "      <th>Total score</th>\n",
       "      <th>G1_2</th>\n",
       "      <th>G1_3</th>\n",
       "      <th>G2_1</th>\n",
       "      <th>G2_2</th>\n",
       "      <th>G2_3</th>\n",
       "      <th>log_G2_1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.616</td>\n",
       "      <td>72.08069</td>\n",
       "      <td>C12244;C12244;CFMID</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C4H9N</td>\n",
       "      <td>NaN</td>\n",
       "      <td>ASVKKRLMJCWVQF-UHFFFAOYSA-N</td>\n",
       "      <td>C=CCCN</td>\n",
       "      <td>True</td>\n",
       "      <td>7</td>\n",
       "      <td>0.57</td>\n",
       "      <td>1.47</td>\n",
       "      <td>2.153791e+08</td>\n",
       "      <td>3.298944e+08</td>\n",
       "      <td>3.024013e+08</td>\n",
       "      <td>2.554517e+08</td>\n",
       "      <td>2.960232e+08</td>\n",
       "      <td>8.480584</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.458</td>\n",
       "      <td>72.08071</td>\n",
       "      <td>C12244;C12244;CFMID</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C4H9N</td>\n",
       "      <td>NaN</td>\n",
       "      <td>ASVKKRLMJCWVQF-UHFFFAOYSA-N</td>\n",
       "      <td>C=CCCN</td>\n",
       "      <td>True</td>\n",
       "      <td>7</td>\n",
       "      <td>0.57</td>\n",
       "      <td>1.47</td>\n",
       "      <td>5.680004e+07</td>\n",
       "      <td>4.699128e+07</td>\n",
       "      <td>5.367473e+07</td>\n",
       "      <td>8.490319e+07</td>\n",
       "      <td>4.850104e+07</td>\n",
       "      <td>7.729770</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.674</td>\n",
       "      <td>74.05996</td>\n",
       "      <td>C00263;L-Homoserine;inhouse</td>\n",
       "      <td>[M-UNK]+</td>\n",
       "      <td>C4H9NO3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>UKAUYVFTDYCKQA-VKHMYHEASA-N</td>\n",
       "      <td>N[C@H](C(O)=O)CCO</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>0.33</td>\n",
       "      <td>1.68</td>\n",
       "      <td>6.999451e+07</td>\n",
       "      <td>9.600511e+07</td>\n",
       "      <td>6.023802e+07</td>\n",
       "      <td>3.765911e+07</td>\n",
       "      <td>8.988370e+07</td>\n",
       "      <td>7.779871</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Average Rt(min)  Average Mz              Metabolite name Adduct type  \\\n",
       "0            1.616    72.08069          C12244;C12244;CFMID      [M+H]+   \n",
       "1            1.458    72.08071          C12244;C12244;CFMID      [M+H]+   \n",
       "2            1.674    74.05996  C00263;L-Homoserine;inhouse    [M-UNK]+   \n",
       "\n",
       "   Formula Ontology                     INCHIKEY             SMILES  \\\n",
       "0    C4H9N      NaN  ASVKKRLMJCWVQF-UHFFFAOYSA-N             C=CCCN   \n",
       "1    C4H9N      NaN  ASVKKRLMJCWVQF-UHFFFAOYSA-N             C=CCCN   \n",
       "2  C4H9NO3      NaN  UKAUYVFTDYCKQA-VKHMYHEASA-N  N[C@H](C(O)=O)CCO   \n",
       "\n",
       "   MS/MS matched  Matched peaks count  Matched peaks percentage  Total score  \\\n",
       "0           True                    7                      0.57         1.47   \n",
       "1           True                    7                      0.57         1.47   \n",
       "2           True                    3                      0.33         1.68   \n",
       "\n",
       "           G1_2          G1_3          G2_1          G2_2          G2_3  \\\n",
       "0  2.153791e+08  3.298944e+08  3.024013e+08  2.554517e+08  2.960232e+08   \n",
       "1  5.680004e+07  4.699128e+07  5.367473e+07  8.490319e+07  4.850104e+07   \n",
       "2  6.999451e+07  9.600511e+07  6.023802e+07  3.765911e+07  8.988370e+07   \n",
       "\n",
       "   log_G2_1  \n",
       "0  8.480584  \n",
       "1  7.729770  \n",
       "2  7.779871  "
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "# 通常做类似出列\n",
    "df['log_G2_1'] = np.log10(df['G2_1'])\n",
    "df.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>G2_1</th>\n",
       "      <th>log_G2_1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3.024013e+08</td>\n",
       "      <td>8.480584</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5.367473e+07</td>\n",
       "      <td>7.729770</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6.023802e+07</td>\n",
       "      <td>7.779871</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>7.484436e+07</td>\n",
       "      <td>7.874159</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>9.390745e+09</td>\n",
       "      <td>9.972700</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259</th>\n",
       "      <td>2.035576e+07</td>\n",
       "      <td>7.308687</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>260</th>\n",
       "      <td>2.734778e+07</td>\n",
       "      <td>7.436922</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>261</th>\n",
       "      <td>3.487791e+07</td>\n",
       "      <td>7.542550</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>262</th>\n",
       "      <td>6.269818e+07</td>\n",
       "      <td>7.797255</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>263</th>\n",
       "      <td>1.359576e+08</td>\n",
       "      <td>8.133403</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>264 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             G2_1  log_G2_1\n",
       "0    3.024013e+08  8.480584\n",
       "1    5.367473e+07  7.729770\n",
       "2    6.023802e+07  7.779871\n",
       "3    7.484436e+07  7.874159\n",
       "4    9.390745e+09  9.972700\n",
       "..            ...       ...\n",
       "259  2.035576e+07  7.308687\n",
       "260  2.734778e+07  7.436922\n",
       "261  3.487791e+07  7.542550\n",
       "262  6.269818e+07  7.797255\n",
       "263  1.359576e+08  8.133403\n",
       "\n",
       "[264 rows x 2 columns]"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['G2_1', 'log_G2_1']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 增加行和列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_new = df.sample(20).copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.concat([df, df_new])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(284, 18)"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2 = df.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Average Rt(min)</th>\n",
       "      <th>Average Mz</th>\n",
       "      <th>Metabolite name</th>\n",
       "      <th>Adduct type</th>\n",
       "      <th>Formula</th>\n",
       "      <th>Ontology</th>\n",
       "      <th>INCHIKEY</th>\n",
       "      <th>SMILES</th>\n",
       "      <th>MS/MS matched</th>\n",
       "      <th>Matched peaks count</th>\n",
       "      <th>Matched peaks percentage</th>\n",
       "      <th>Total score</th>\n",
       "      <th>G1_2</th>\n",
       "      <th>G1_3</th>\n",
       "      <th>G2_1</th>\n",
       "      <th>G2_2</th>\n",
       "      <th>G2_3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.616</td>\n",
       "      <td>72.08069</td>\n",
       "      <td>C12244;C12244;CFMID</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C4H9N</td>\n",
       "      <td>NaN</td>\n",
       "      <td>ASVKKRLMJCWVQF-UHFFFAOYSA-N</td>\n",
       "      <td>C=CCCN</td>\n",
       "      <td>True</td>\n",
       "      <td>7</td>\n",
       "      <td>0.57</td>\n",
       "      <td>1.47</td>\n",
       "      <td>2.153791e+08</td>\n",
       "      <td>3.298944e+08</td>\n",
       "      <td>3.024013e+08</td>\n",
       "      <td>2.554517e+08</td>\n",
       "      <td>2.960232e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.458</td>\n",
       "      <td>72.08071</td>\n",
       "      <td>C12244;C12244;CFMID</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C4H9N</td>\n",
       "      <td>NaN</td>\n",
       "      <td>ASVKKRLMJCWVQF-UHFFFAOYSA-N</td>\n",
       "      <td>C=CCCN</td>\n",
       "      <td>True</td>\n",
       "      <td>7</td>\n",
       "      <td>0.57</td>\n",
       "      <td>1.47</td>\n",
       "      <td>5.680004e+07</td>\n",
       "      <td>4.699128e+07</td>\n",
       "      <td>5.367473e+07</td>\n",
       "      <td>8.490319e+07</td>\n",
       "      <td>4.850104e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.674</td>\n",
       "      <td>74.05996</td>\n",
       "      <td>C00263;L-Homoserine;inhouse</td>\n",
       "      <td>[M-UNK]+</td>\n",
       "      <td>C4H9NO3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>UKAUYVFTDYCKQA-VKHMYHEASA-N</td>\n",
       "      <td>N[C@H](C(O)=O)CCO</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>0.33</td>\n",
       "      <td>1.68</td>\n",
       "      <td>6.999451e+07</td>\n",
       "      <td>9.600511e+07</td>\n",
       "      <td>6.023802e+07</td>\n",
       "      <td>3.765911e+07</td>\n",
       "      <td>8.988370e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.250</td>\n",
       "      <td>76.03921</td>\n",
       "      <td>C00037;Glycine; AIF; CE10; MS2Dec;MoNA</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C2H5NO2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DHMQDGOQFOQNFH-UHFFFAOYSA-N</td>\n",
       "      <td>O=C(O)CN</td>\n",
       "      <td>True</td>\n",
       "      <td>5</td>\n",
       "      <td>0.40</td>\n",
       "      <td>1.58</td>\n",
       "      <td>4.955085e+07</td>\n",
       "      <td>8.532785e+07</td>\n",
       "      <td>7.484436e+07</td>\n",
       "      <td>7.181890e+07</td>\n",
       "      <td>5.938876e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.627</td>\n",
       "      <td>79.02108</td>\n",
       "      <td>C11143;Dimethyl?sulfoxide;msd</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C2H6OS</td>\n",
       "      <td>Sulfoxides</td>\n",
       "      <td>IAZDPXIOMUYVGZ-UHFFFAOYSA-N</td>\n",
       "      <td>O=S(C)C</td>\n",
       "      <td>True</td>\n",
       "      <td>10</td>\n",
       "      <td>0.40</td>\n",
       "      <td>1.68</td>\n",
       "      <td>2.336511e+10</td>\n",
       "      <td>6.540745e+09</td>\n",
       "      <td>9.390745e+09</td>\n",
       "      <td>9.984601e+09</td>\n",
       "      <td>8.283881e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>230</th>\n",
       "      <td>13.518</td>\n",
       "      <td>400.34183</td>\n",
       "      <td>C02990;Palmitoyl-carnitine; AIF; CE30; MS2Dec;...</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C23H45NO4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>XOMRRQXKHMYMOC-OAQYLSRUSA-N</td>\n",
       "      <td>O=C([O-])CC(OC(=O)CCCCCCCCCCCCCCC)C[N+](C)(C)C</td>\n",
       "      <td>True</td>\n",
       "      <td>5</td>\n",
       "      <td>0.40</td>\n",
       "      <td>1.64</td>\n",
       "      <td>1.430898e+05</td>\n",
       "      <td>3.530477e+05</td>\n",
       "      <td>8.265806e+04</td>\n",
       "      <td>4.275860e+04</td>\n",
       "      <td>6.046636e+04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>222</th>\n",
       "      <td>17.728</td>\n",
       "      <td>400.34155</td>\n",
       "      <td>C02990;Palmitoyl-carnitine; AIF; CE30; MS2Dec;...</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C23H45NO4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>XOMRRQXKHMYMOC-OAQYLSRUSA-N</td>\n",
       "      <td>O=C([O-])CC(OC(=O)CCCCCCCCCCCCCCC)C[N+](C)(C)C</td>\n",
       "      <td>True</td>\n",
       "      <td>5</td>\n",
       "      <td>0.40</td>\n",
       "      <td>1.64</td>\n",
       "      <td>3.431087e+04</td>\n",
       "      <td>7.730116e+04</td>\n",
       "      <td>1.648881e+03</td>\n",
       "      <td>7.254747e+03</td>\n",
       "      <td>5.183546e+03</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>224</th>\n",
       "      <td>16.146</td>\n",
       "      <td>400.34158</td>\n",
       "      <td>C02990;Palmitoyl-carnitine; AIF; CE30; MS2Dec;...</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C23H45NO4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>XOMRRQXKHMYMOC-OAQYLSRUSA-N</td>\n",
       "      <td>O=C([O-])CC(OC(=O)CCCCCCCCCCCCCCC)C[N+](C)(C)C</td>\n",
       "      <td>True</td>\n",
       "      <td>5</td>\n",
       "      <td>0.60</td>\n",
       "      <td>1.67</td>\n",
       "      <td>6.617041e+05</td>\n",
       "      <td>1.294743e+05</td>\n",
       "      <td>1.032811e+04</td>\n",
       "      <td>2.032350e+03</td>\n",
       "      <td>1.340328e+04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>236</th>\n",
       "      <td>18.345</td>\n",
       "      <td>408.36795</td>\n",
       "      <td>C10354;2-(Hydroxymethyl)anthraquinone;inhouse</td>\n",
       "      <td>[M+UNK]+</td>\n",
       "      <td>C15H10O3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>JYKHAJGLEVKEAA-UHFFFAOYSA-N</td>\n",
       "      <td>O=C1C2=C(C=C(CO)C=C2)C(C3=C1C=CC=C3)=O</td>\n",
       "      <td>True</td>\n",
       "      <td>21</td>\n",
       "      <td>0.33</td>\n",
       "      <td>1.70</td>\n",
       "      <td>1.549098e+07</td>\n",
       "      <td>2.959743e+07</td>\n",
       "      <td>5.768400e+06</td>\n",
       "      <td>2.313568e+07</td>\n",
       "      <td>1.431721e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>8.787</td>\n",
       "      <td>107.07011</td>\n",
       "      <td>C14689;C14689;CFMID</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C4H10O3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>MTHSVFCYNBDYFN-UHFFFAOYSA-N</td>\n",
       "      <td>OCCOCCO</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>0.33</td>\n",
       "      <td>1.66</td>\n",
       "      <td>2.150807e+07</td>\n",
       "      <td>2.226414e+07</td>\n",
       "      <td>2.256490e+07</td>\n",
       "      <td>2.302351e+07</td>\n",
       "      <td>2.047077e+07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>284 rows × 17 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Average Rt(min)  Average Mz  \\\n",
       "0              1.616    72.08069   \n",
       "1              1.458    72.08071   \n",
       "2              1.674    74.05996   \n",
       "3              1.250    76.03921   \n",
       "4              1.627    79.02108   \n",
       "..               ...         ...   \n",
       "230           13.518   400.34183   \n",
       "222           17.728   400.34155   \n",
       "224           16.146   400.34158   \n",
       "236           18.345   408.36795   \n",
       "17             8.787   107.07011   \n",
       "\n",
       "                                       Metabolite name Adduct type    Formula  \\\n",
       "0                                  C12244;C12244;CFMID      [M+H]+      C4H9N   \n",
       "1                                  C12244;C12244;CFMID      [M+H]+      C4H9N   \n",
       "2                          C00263;L-Homoserine;inhouse    [M-UNK]+    C4H9NO3   \n",
       "3               C00037;Glycine; AIF; CE10; MS2Dec;MoNA      [M+H]+    C2H5NO2   \n",
       "4                        C11143;Dimethyl?sulfoxide;msd      [M+H]+     C2H6OS   \n",
       "..                                                 ...         ...        ...   \n",
       "230  C02990;Palmitoyl-carnitine; AIF; CE30; MS2Dec;...      [M+H]+  C23H45NO4   \n",
       "222  C02990;Palmitoyl-carnitine; AIF; CE30; MS2Dec;...      [M+H]+  C23H45NO4   \n",
       "224  C02990;Palmitoyl-carnitine; AIF; CE30; MS2Dec;...      [M+H]+  C23H45NO4   \n",
       "236      C10354;2-(Hydroxymethyl)anthraquinone;inhouse    [M+UNK]+   C15H10O3   \n",
       "17                                 C14689;C14689;CFMID      [M+H]+    C4H10O3   \n",
       "\n",
       "       Ontology                     INCHIKEY  \\\n",
       "0           NaN  ASVKKRLMJCWVQF-UHFFFAOYSA-N   \n",
       "1           NaN  ASVKKRLMJCWVQF-UHFFFAOYSA-N   \n",
       "2           NaN  UKAUYVFTDYCKQA-VKHMYHEASA-N   \n",
       "3           NaN  DHMQDGOQFOQNFH-UHFFFAOYSA-N   \n",
       "4    Sulfoxides  IAZDPXIOMUYVGZ-UHFFFAOYSA-N   \n",
       "..          ...                          ...   \n",
       "230         NaN  XOMRRQXKHMYMOC-OAQYLSRUSA-N   \n",
       "222         NaN  XOMRRQXKHMYMOC-OAQYLSRUSA-N   \n",
       "224         NaN  XOMRRQXKHMYMOC-OAQYLSRUSA-N   \n",
       "236         NaN  JYKHAJGLEVKEAA-UHFFFAOYSA-N   \n",
       "17          NaN  MTHSVFCYNBDYFN-UHFFFAOYSA-N   \n",
       "\n",
       "                                             SMILES  MS/MS matched  \\\n",
       "0                                            C=CCCN           True   \n",
       "1                                            C=CCCN           True   \n",
       "2                                 N[C@H](C(O)=O)CCO           True   \n",
       "3                                          O=C(O)CN           True   \n",
       "4                                           O=S(C)C           True   \n",
       "..                                              ...            ...   \n",
       "230  O=C([O-])CC(OC(=O)CCCCCCCCCCCCCCC)C[N+](C)(C)C           True   \n",
       "222  O=C([O-])CC(OC(=O)CCCCCCCCCCCCCCC)C[N+](C)(C)C           True   \n",
       "224  O=C([O-])CC(OC(=O)CCCCCCCCCCCCCCC)C[N+](C)(C)C           True   \n",
       "236          O=C1C2=C(C=C(CO)C=C2)C(C3=C1C=CC=C3)=O           True   \n",
       "17                                          OCCOCCO           True   \n",
       "\n",
       "     Matched peaks count  Matched peaks percentage  Total score          G1_2  \\\n",
       "0                      7                      0.57         1.47  2.153791e+08   \n",
       "1                      7                      0.57         1.47  5.680004e+07   \n",
       "2                      3                      0.33         1.68  6.999451e+07   \n",
       "3                      5                      0.40         1.58  4.955085e+07   \n",
       "4                     10                      0.40         1.68  2.336511e+10   \n",
       "..                   ...                       ...          ...           ...   \n",
       "230                    5                      0.40         1.64  1.430898e+05   \n",
       "222                    5                      0.40         1.64  3.431087e+04   \n",
       "224                    5                      0.60         1.67  6.617041e+05   \n",
       "236                   21                      0.33         1.70  1.549098e+07   \n",
       "17                     3                      0.33         1.66  2.150807e+07   \n",
       "\n",
       "             G1_3          G2_1          G2_2          G2_3  \n",
       "0    3.298944e+08  3.024013e+08  2.554517e+08  2.960232e+08  \n",
       "1    4.699128e+07  5.367473e+07  8.490319e+07  4.850104e+07  \n",
       "2    9.600511e+07  6.023802e+07  3.765911e+07  8.988370e+07  \n",
       "3    8.532785e+07  7.484436e+07  7.181890e+07  5.938876e+07  \n",
       "4    6.540745e+09  9.390745e+09  9.984601e+09  8.283881e+09  \n",
       "..            ...           ...           ...           ...  \n",
       "230  3.530477e+05  8.265806e+04  4.275860e+04  6.046636e+04  \n",
       "222  7.730116e+04  1.648881e+03  7.254747e+03  5.183546e+03  \n",
       "224  1.294743e+05  1.032811e+04  2.032350e+03  1.340328e+04  \n",
       "236  2.959743e+07  5.768400e+06  2.313568e+07  1.431721e+07  \n",
       "17   2.226414e+07  2.256490e+07  2.302351e+07  2.047077e+07  \n",
       "\n",
       "[284 rows x 17 columns]"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 删除列\n",
    "df2 = df2.drop('log_G2_1', axis=1)\n",
    "# axis=1,指定按列删除\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3.3 DataFrame 信息概要"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5112"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查看DataFrame元素个数\n",
    "df.size #看开发者文档，pandas dataframe，  码农"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Index: 284 entries, 0 to 17\n",
      "Data columns (total 18 columns):\n",
      " #   Column                    Non-Null Count  Dtype  \n",
      "---  ------                    --------------  -----  \n",
      " 0   Average Rt(min)           284 non-null    float64\n",
      " 1   Average Mz                284 non-null    float64\n",
      " 2   Metabolite name           284 non-null    object \n",
      " 3   Adduct type               284 non-null    object \n",
      " 4   Formula                   284 non-null    object \n",
      " 5   Ontology                  49 non-null     object \n",
      " 6   INCHIKEY                  284 non-null    object \n",
      " 7   SMILES                    284 non-null    object \n",
      " 8   MS/MS matched             284 non-null    bool   \n",
      " 9   Matched peaks count       284 non-null    int64  \n",
      " 10  Matched peaks percentage  284 non-null    float64\n",
      " 11  Total score               284 non-null    float64\n",
      " 12  G1_2                      284 non-null    float64\n",
      " 13  G1_3                      284 non-null    float64\n",
      " 14  G2_1                      284 non-null    float64\n",
      " 15  G2_2                      284 non-null    float64\n",
      " 16  G2_3                      284 non-null    float64\n",
      " 17  log_G2_1                  284 non-null    float64\n",
      "dtypes: bool(1), float64(10), int64(1), object(6)\n",
      "memory usage: 40.2+ KB\n"
     ]
    }
   ],
   "source": [
    "# 查看基本信息\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Average Rt(min)             float64\n",
       "Average Mz                  float64\n",
       "Metabolite name              object\n",
       "Adduct type                  object\n",
       "Formula                      object\n",
       "Ontology                     object\n",
       "INCHIKEY                     object\n",
       "SMILES                       object\n",
       "MS/MS matched                  bool\n",
       "Matched peaks count           int64\n",
       "Matched peaks percentage    float64\n",
       "Total score                 float64\n",
       "G1_2                        float64\n",
       "G1_3                        float64\n",
       "G2_1                        float64\n",
       "G2_2                        float64\n",
       "G2_3                        float64\n",
       "log_G2_1                    float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查看列的数据类型\n",
    "df.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    2.840000e+02\n",
       "mean     4.281146e+08\n",
       "std      1.679414e+09\n",
       "min      1.178917e+03\n",
       "25%      1.399581e+07\n",
       "50%      4.070286e+07\n",
       "75%      1.549492e+08\n",
       "max      1.913928e+10\n",
       "Name: G2_1, dtype: float64"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['G2_1'].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "144"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['INCHIKEY'].nunique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3.3 DataFrame 排序"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 单列排序\n",
    "\n",
    "df.sort_values(by='Average Mz',ascending=False, inplace=True)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>smiles</th>\n",
       "      <th>name</th>\n",
       "      <th>MolWt</th>\n",
       "      <th>TPSA</th>\n",
       "      <th>nRotB</th>\n",
       "      <th>HBD</th>\n",
       "      <th>HBA</th>\n",
       "      <th>LogP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>76</th>\n",
       "      <td>COc1cc(O)c2c(=O)c(O[C@@H]3OC[C@@](O)(CO[C@@H]4...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>726.200729</td>\n",
       "      <td>297.12</td>\n",
       "      <td>11</td>\n",
       "      <td>10</td>\n",
       "      <td>19</td>\n",
       "      <td>-3.36280</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>86</th>\n",
       "      <td>C[C@@H]1O[C@@H](OC[C@H]2O[C@@H](Oc3c(-c4ccc(O)...</td>\n",
       "      <td>rutin|RUTIN</td>\n",
       "      <td>610.153385</td>\n",
       "      <td>269.43</td>\n",
       "      <td>6</td>\n",
       "      <td>10</td>\n",
       "      <td>16</td>\n",
       "      <td>-1.68710</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>71</th>\n",
       "      <td>O=c1c(O)c(-c2ccc(O)cc2O)oc2cc(O[C@@H]3O[C@H](C...</td>\n",
       "      <td>morin-7-O-beta-D-glucopyranoside|morin-7-O-bet...</td>\n",
       "      <td>464.095476</td>\n",
       "      <td>210.51</td>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>12</td>\n",
       "      <td>-0.53890</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>O=c1c(O[C@@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]...</td>\n",
       "      <td>astragalin|kaempferol-3-glucoside|Astragalin|5...</td>\n",
       "      <td>448.100562</td>\n",
       "      <td>190.28</td>\n",
       "      <td>4</td>\n",
       "      <td>7</td>\n",
       "      <td>11</td>\n",
       "      <td>-0.24450</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75</th>\n",
       "      <td>Cc1c(O[C@@H]2OC[C@@H](O)[C@H](O[C@@H]3OC[C@@H]...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>534.173726</td>\n",
       "      <td>184.60</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>12</td>\n",
       "      <td>-0.31178</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63</th>\n",
       "      <td>CC1C=CC2(C(C)C)CC12</td>\n",
       "      <td>4-methylene-1-(1-methylethyl)bicyclo[3.1.0]hex...</td>\n",
       "      <td>136.125200</td>\n",
       "      <td>0.00</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2.85460</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>64</th>\n",
       "      <td>C=CC(=C)CCC=C(C)CCC=C(C)C</td>\n",
       "      <td>7,11-dimethyl-3-methylene-1,6,10-dodecatriene|...</td>\n",
       "      <td>204.187801</td>\n",
       "      <td>0.00</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5.20150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>CC1=CC2C(=C(C)CCCC2(C)C)CC1</td>\n",
       "      <td>2,4a,5,6,7,8-hexahydro-3,5,5,9-tetramethyl-1H-...</td>\n",
       "      <td>204.187801</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4.86930</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>73</th>\n",
       "      <td>C=CC(C)=CC=CC(=C)C</td>\n",
       "      <td>2,6-dimethyl-1,3,5,7-octatetraene|2,6-dimethyl...</td>\n",
       "      <td>134.109550</td>\n",
       "      <td>0.00</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3.25100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>C=CC1=C2CCC2CCC1</td>\n",
       "      <td>7-exo-ethenylbicyclo[4.2.0]oct-1-ene</td>\n",
       "      <td>134.109550</td>\n",
       "      <td>0.00</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3.06290</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>92 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               smiles  \\\n",
       "76  COc1cc(O)c2c(=O)c(O[C@@H]3OC[C@@](O)(CO[C@@H]4...   \n",
       "86  C[C@@H]1O[C@@H](OC[C@H]2O[C@@H](Oc3c(-c4ccc(O)...   \n",
       "71  O=c1c(O)c(-c2ccc(O)cc2O)oc2cc(O[C@@H]3O[C@H](C...   \n",
       "28  O=c1c(O[C@@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]...   \n",
       "75  Cc1c(O[C@@H]2OC[C@@H](O)[C@H](O[C@@H]3OC[C@@H]...   \n",
       "..                                                ...   \n",
       "63                                CC1C=CC2(C(C)C)CC12   \n",
       "64                          C=CC(=C)CCC=C(C)CCC=C(C)C   \n",
       "67                        CC1=CC2C(=C(C)CCCC2(C)C)CC1   \n",
       "73                                 C=CC(C)=CC=CC(=C)C   \n",
       "79                                   C=CC1=C2CCC2CCC1   \n",
       "\n",
       "                                                 name       MolWt    TPSA  \\\n",
       "76                                                NaN  726.200729  297.12   \n",
       "86                                        rutin|RUTIN  610.153385  269.43   \n",
       "71  morin-7-O-beta-D-glucopyranoside|morin-7-O-bet...  464.095476  210.51   \n",
       "28  astragalin|kaempferol-3-glucoside|Astragalin|5...  448.100562  190.28   \n",
       "75                                                NaN  534.173726  184.60   \n",
       "..                                                ...         ...     ...   \n",
       "63  4-methylene-1-(1-methylethyl)bicyclo[3.1.0]hex...  136.125200    0.00   \n",
       "64  7,11-dimethyl-3-methylene-1,6,10-dodecatriene|...  204.187801    0.00   \n",
       "67  2,4a,5,6,7,8-hexahydro-3,5,5,9-tetramethyl-1H-...  204.187801    0.00   \n",
       "73  2,6-dimethyl-1,3,5,7-octatetraene|2,6-dimethyl...  134.109550    0.00   \n",
       "79               7-exo-ethenylbicyclo[4.2.0]oct-1-ene  134.109550    0.00   \n",
       "\n",
       "    nRotB  HBD  HBA     LogP  \n",
       "76     11   10   19 -3.36280  \n",
       "86      6   10   16 -1.68710  \n",
       "71      4    8   12 -0.53890  \n",
       "28      4    7   11 -0.24450  \n",
       "75      5    6   12 -0.31178  \n",
       "..    ...  ...  ...      ...  \n",
       "63      1    0    0  2.85460  \n",
       "64      7    0    0  5.20150  \n",
       "67      0    0    0  4.86930  \n",
       "73      3    0    0  3.25100  \n",
       "79      1    0    0  3.06290  \n",
       "\n",
       "[92 rows x 8 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 多列排序\n",
    "#df.sort_values(by=['HBA','HBD'], ascending=False)\n",
    "df.sort_values(by=['HBD','HBA'], ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3.4 过滤、筛选"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Average Rt(min)</th>\n",
       "      <th>Average Mz</th>\n",
       "      <th>Metabolite name</th>\n",
       "      <th>Adduct type</th>\n",
       "      <th>Formula</th>\n",
       "      <th>Ontology</th>\n",
       "      <th>INCHIKEY</th>\n",
       "      <th>SMILES</th>\n",
       "      <th>MS/MS matched</th>\n",
       "      <th>Matched peaks count</th>\n",
       "      <th>Matched peaks percentage</th>\n",
       "      <th>Total score</th>\n",
       "      <th>G1_2</th>\n",
       "      <th>G1_3</th>\n",
       "      <th>G2_1</th>\n",
       "      <th>G2_2</th>\n",
       "      <th>G2_3</th>\n",
       "      <th>log_G2_1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>253</th>\n",
       "      <td>19.544</td>\n",
       "      <td>585.27020</td>\n",
       "      <td>C00486;Bilirubin; LC-tDDA; CE40;MoNA</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C33H36N4O6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>BPYKTIZUTYGOLE-IFADSCNNSA-N</td>\n",
       "      <td>O=C(O)CCC1=C(NC(C=C2N=C(O)C(=C2C=C)C)=C1C)CC=3...</td>\n",
       "      <td>True</td>\n",
       "      <td>8</td>\n",
       "      <td>0.50</td>\n",
       "      <td>1.61</td>\n",
       "      <td>3.563278e+06</td>\n",
       "      <td>2.546032e+06</td>\n",
       "      <td>6.217936e+06</td>\n",
       "      <td>6.133714e+06</td>\n",
       "      <td>4.558848e+06</td>\n",
       "      <td>6.793646</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>253</th>\n",
       "      <td>19.544</td>\n",
       "      <td>585.27020</td>\n",
       "      <td>C00486;Bilirubin; LC-tDDA; CE40;MoNA</td>\n",
       "      <td>[M+H]+</td>\n",
       "      <td>C33H36N4O6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>BPYKTIZUTYGOLE-IFADSCNNSA-N</td>\n",
       "      <td>O=C(O)CCC1=C(NC(C=C2N=C(O)C(=C2C=C)C)=C1C)CC=3...</td>\n",
       "      <td>True</td>\n",
       "      <td>8</td>\n",
       "      <td>0.50</td>\n",
       "      <td>1.61</td>\n",
       "      <td>3.563278e+06</td>\n",
       "      <td>2.546032e+06</td>\n",
       "      <td>6.217936e+06</td>\n",
       "      <td>6.133714e+06</td>\n",
       "      <td>4.558848e+06</td>\n",
       "      <td>6.793646</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>252</th>\n",
       "      <td>18.058</td>\n",
       "      <td>559.51636</td>\n",
       "      <td>C17446;Kurarinone;inhouse</td>\n",
       "      <td>[M+UNK]+</td>\n",
       "      <td>C26H30O6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LTTQKYMNTNISSZ-MWTRTKDXSA-N</td>\n",
       "      <td>OC(C=C1)=CC(O)=C1[C@@H]2CC(C3=C(OC)C=C(O)C(C[C...</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1.66</td>\n",
       "      <td>3.194407e+07</td>\n",
       "      <td>1.403485e+07</td>\n",
       "      <td>6.160364e+07</td>\n",
       "      <td>4.516781e+07</td>\n",
       "      <td>4.575779e+07</td>\n",
       "      <td>7.789606</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>250</th>\n",
       "      <td>17.750</td>\n",
       "      <td>559.51630</td>\n",
       "      <td>C17446;Kurarinone;inhouse</td>\n",
       "      <td>[M+UNK]+</td>\n",
       "      <td>C26H30O6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LTTQKYMNTNISSZ-MWTRTKDXSA-N</td>\n",
       "      <td>OC(C=C1)=CC(O)=C1[C@@H]2CC(C3=C(OC)C=C(O)C(C[C...</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1.66</td>\n",
       "      <td>5.850167e+07</td>\n",
       "      <td>2.076818e+07</td>\n",
       "      <td>2.766266e+08</td>\n",
       "      <td>7.997221e+07</td>\n",
       "      <td>6.547063e+07</td>\n",
       "      <td>8.441894</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>251</th>\n",
       "      <td>17.273</td>\n",
       "      <td>559.51630</td>\n",
       "      <td>C17446;Kurarinone;inhouse</td>\n",
       "      <td>[M+UNK]+</td>\n",
       "      <td>C26H30O6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LTTQKYMNTNISSZ-MWTRTKDXSA-N</td>\n",
       "      <td>OC(C=C1)=CC(O)=C1[C@@H]2CC(C3=C(OC)C=C(O)C(C[C...</td>\n",
       "      <td>True</td>\n",
       "      <td>3</td>\n",
       "      <td>0.67</td>\n",
       "      <td>1.66</td>\n",
       "      <td>1.928304e+07</td>\n",
       "      <td>6.540624e+06</td>\n",
       "      <td>1.726800e+07</td>\n",
       "      <td>9.403136e+06</td>\n",
       "      <td>2.106992e+07</td>\n",
       "      <td>7.237242</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>249</th>\n",
       "      <td>17.346</td>\n",
       "      <td>519.13837</td>\n",
       "      <td>C08817;Fucosterol;inhouse</td>\n",
       "      <td>[M+UNK]+</td>\n",
       "      <td>C29H48O</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OSELKOCHBMDKEJ-JUGJNGJRSA-N</td>\n",
       "      <td>CC(C)/C(/CC[C@@H](C)[C@@H](CC1)[C@@](C)(CC2)[C...</td>\n",
       "      <td>True</td>\n",
       "      <td>33</td>\n",
       "      <td>0.30</td>\n",
       "      <td>1.45</td>\n",
       "      <td>3.131875e+06</td>\n",
       "      <td>4.062370e+06</td>\n",
       "      <td>4.634789e+06</td>\n",
       "      <td>3.882763e+06</td>\n",
       "      <td>3.667451e+06</td>\n",
       "      <td>6.666030</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>248</th>\n",
       "      <td>14.230</td>\n",
       "      <td>504.32309</td>\n",
       "      <td>C00022;Pyruvic acid;inhouse</td>\n",
       "      <td>[M+UNK]+</td>\n",
       "      <td>C3H4O3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LCTONWCANYUPML-UHFFFAOYSA-N</td>\n",
       "      <td>CC(C(O)=O)=O</td>\n",
       "      <td>True</td>\n",
       "      <td>14</td>\n",
       "      <td>0.36</td>\n",
       "      <td>1.52</td>\n",
       "      <td>3.503598e+06</td>\n",
       "      <td>2.431596e+06</td>\n",
       "      <td>8.036579e+06</td>\n",
       "      <td>3.166362e+06</td>\n",
       "      <td>2.250061e+06</td>\n",
       "      <td>6.905071</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Average Rt(min)  Average Mz                       Metabolite name  \\\n",
       "253           19.544   585.27020  C00486;Bilirubin; LC-tDDA; CE40;MoNA   \n",
       "253           19.544   585.27020  C00486;Bilirubin; LC-tDDA; CE40;MoNA   \n",
       "252           18.058   559.51636             C17446;Kurarinone;inhouse   \n",
       "250           17.750   559.51630             C17446;Kurarinone;inhouse   \n",
       "251           17.273   559.51630             C17446;Kurarinone;inhouse   \n",
       "249           17.346   519.13837             C08817;Fucosterol;inhouse   \n",
       "248           14.230   504.32309           C00022;Pyruvic acid;inhouse   \n",
       "\n",
       "    Adduct type     Formula Ontology                     INCHIKEY  \\\n",
       "253      [M+H]+  C33H36N4O6      NaN  BPYKTIZUTYGOLE-IFADSCNNSA-N   \n",
       "253      [M+H]+  C33H36N4O6      NaN  BPYKTIZUTYGOLE-IFADSCNNSA-N   \n",
       "252    [M+UNK]+    C26H30O6      NaN  LTTQKYMNTNISSZ-MWTRTKDXSA-N   \n",
       "250    [M+UNK]+    C26H30O6      NaN  LTTQKYMNTNISSZ-MWTRTKDXSA-N   \n",
       "251    [M+UNK]+    C26H30O6      NaN  LTTQKYMNTNISSZ-MWTRTKDXSA-N   \n",
       "249    [M+UNK]+     C29H48O      NaN  OSELKOCHBMDKEJ-JUGJNGJRSA-N   \n",
       "248    [M+UNK]+      C3H4O3      NaN  LCTONWCANYUPML-UHFFFAOYSA-N   \n",
       "\n",
       "                                                SMILES  MS/MS matched  \\\n",
       "253  O=C(O)CCC1=C(NC(C=C2N=C(O)C(=C2C=C)C)=C1C)CC=3...           True   \n",
       "253  O=C(O)CCC1=C(NC(C=C2N=C(O)C(=C2C=C)C)=C1C)CC=3...           True   \n",
       "252  OC(C=C1)=CC(O)=C1[C@@H]2CC(C3=C(OC)C=C(O)C(C[C...           True   \n",
       "250  OC(C=C1)=CC(O)=C1[C@@H]2CC(C3=C(OC)C=C(O)C(C[C...           True   \n",
       "251  OC(C=C1)=CC(O)=C1[C@@H]2CC(C3=C(OC)C=C(O)C(C[C...           True   \n",
       "249  CC(C)/C(/CC[C@@H](C)[C@@H](CC1)[C@@](C)(CC2)[C...           True   \n",
       "248                                       CC(C(O)=O)=O           True   \n",
       "\n",
       "     Matched peaks count  Matched peaks percentage  Total score          G1_2  \\\n",
       "253                    8                      0.50         1.61  3.563278e+06   \n",
       "253                    8                      0.50         1.61  3.563278e+06   \n",
       "252                    3                      0.67         1.66  3.194407e+07   \n",
       "250                    3                      0.67         1.66  5.850167e+07   \n",
       "251                    3                      0.67         1.66  1.928304e+07   \n",
       "249                   33                      0.30         1.45  3.131875e+06   \n",
       "248                   14                      0.36         1.52  3.503598e+06   \n",
       "\n",
       "             G1_3          G2_1          G2_2          G2_3  log_G2_1  \n",
       "253  2.546032e+06  6.217936e+06  6.133714e+06  4.558848e+06  6.793646  \n",
       "253  2.546032e+06  6.217936e+06  6.133714e+06  4.558848e+06  6.793646  \n",
       "252  1.403485e+07  6.160364e+07  4.516781e+07  4.575779e+07  7.789606  \n",
       "250  2.076818e+07  2.766266e+08  7.997221e+07  6.547063e+07  8.441894  \n",
       "251  6.540624e+06  1.726800e+07  9.403136e+06  2.106992e+07  7.237242  \n",
       "249  4.062370e+06  4.634789e+06  3.882763e+06  3.667451e+06  6.666030  \n",
       "248  2.431596e+06  8.036579e+06  3.166362e+06  2.250061e+06  6.905071  "
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 不使用loc函数也可以\n",
    "\n",
    "df[(df['Average Mz'] >= 500) & \\\n",
    "   (df['Average Mz'] <= 600)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "253    C00486;Bilirubin; LC-tDDA; CE40;MoNA\n",
       "253    C00486;Bilirubin; LC-tDDA; CE40;MoNA\n",
       "252               C17446;Kurarinone;inhouse\n",
       "250               C17446;Kurarinone;inhouse\n",
       "251               C17446;Kurarinone;inhouse\n",
       "249               C08817;Fucosterol;inhouse\n",
       "248             C00022;Pyruvic acid;inhouse\n",
       "Name: Metabolite name, dtype: object"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[(df['Average Mz'] >= 500) & \\\n",
    "       (df['Average Mz'] <= 600), 'Metabolite name']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3.5 DataFrame 列运算\n",
    "\n",
    "- 对当前df所做的所有数据处理会保存在一个临时的内存块里，不会修改原有的df\n",
    "- 如果要保留结果，需要将运算结果赋值给一个新的变量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>G2_2</th>\n",
       "      <th>G2_1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>263</th>\n",
       "      <td>9.845110e+09</td>\n",
       "      <td>1.359576e+10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>262</th>\n",
       "      <td>4.896170e+09</td>\n",
       "      <td>6.269818e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>262</th>\n",
       "      <td>4.896170e+09</td>\n",
       "      <td>6.269818e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>261</th>\n",
       "      <td>5.215809e+09</td>\n",
       "      <td>3.487791e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>260</th>\n",
       "      <td>9.164309e+09</td>\n",
       "      <td>2.734778e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>9.984601e+11</td>\n",
       "      <td>9.390745e+11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>7.181890e+09</td>\n",
       "      <td>7.484436e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.765911e+09</td>\n",
       "      <td>6.023802e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>8.490319e+09</td>\n",
       "      <td>5.367473e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2.554517e+10</td>\n",
       "      <td>3.024013e+10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>284 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             G2_2          G2_1\n",
       "263  9.845110e+09  1.359576e+10\n",
       "262  4.896170e+09  6.269818e+09\n",
       "262  4.896170e+09  6.269818e+09\n",
       "261  5.215809e+09  3.487791e+09\n",
       "260  9.164309e+09  2.734778e+09\n",
       "..            ...           ...\n",
       "4    9.984601e+11  9.390745e+11\n",
       "3    7.181890e+09  7.484436e+09\n",
       "2    3.765911e+09  6.023802e+09\n",
       "1    8.490319e+09  5.367473e+09\n",
       "0    2.554517e+10  3.024013e+10\n",
       "\n",
       "[284 rows x 2 columns]"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['G2_2', 'G2_1']] * 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "263    2.344087e+08\n",
       "262    1.116599e+08\n",
       "262    1.116599e+08\n",
       "261    8.703600e+07\n",
       "260    1.189909e+08\n",
       "           ...     \n",
       "4      1.937535e+10\n",
       "3      1.466633e+08\n",
       "2      9.789713e+07\n",
       "1      1.385779e+08\n",
       "0      5.578530e+08\n",
       "Length: 284, dtype: float64"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['G2_2'] + df['G2_1']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>TPSA</th>\n",
       "      <th>HBD</th>\n",
       "      <th>HBA</th>\n",
       "      <th>LogP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>8.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>65.0</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>55.0</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>45.0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>48.0</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>-2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>55.0</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>89</th>\n",
       "      <td>82.0</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>48.0</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>91</th>\n",
       "      <td>59.0</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>-2.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>92 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    TPSA  HBD  HBA  LogP\n",
       "0    8.0    0    0   0.0\n",
       "1   65.0    2    3   0.0\n",
       "2   55.0    2    3   1.0\n",
       "3   45.0    1    2   1.0\n",
       "4    0.0    0    0   1.0\n",
       "..   ...  ...  ...   ...\n",
       "87  48.0    2    2  -2.0\n",
       "88  55.0    2    3   1.0\n",
       "89  82.0    3    4  -1.0\n",
       "90  48.0    2    2   0.0\n",
       "91  59.0    2    3  -2.0\n",
       "\n",
       "[92 rows x 4 columns]"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 除了加减乘除，还有取整、求余、幂运算\n",
    "## 取整,除法的商\n",
    "df_v3 // 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>TPSA</th>\n",
       "      <th>HBD</th>\n",
       "      <th>HBA</th>\n",
       "      <th>LogP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>291.3849</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2.247301</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>17255.4496</td>\n",
       "      <td>25</td>\n",
       "      <td>49</td>\n",
       "      <td>3.952144</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>12349.8769</td>\n",
       "      <td>16</td>\n",
       "      <td>36</td>\n",
       "      <td>5.209350</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8262.8100</td>\n",
       "      <td>9</td>\n",
       "      <td>25</td>\n",
       "      <td>6.639898</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.0000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.992202</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>9602.0401</td>\n",
       "      <td>16</td>\n",
       "      <td>25</td>\n",
       "      <td>7.505956</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>12349.8769</td>\n",
       "      <td>16</td>\n",
       "      <td>36</td>\n",
       "      <td>5.209350</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>89</th>\n",
       "      <td>27142.5625</td>\n",
       "      <td>36</td>\n",
       "      <td>64</td>\n",
       "      <td>0.417187</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>9602.0401</td>\n",
       "      <td>16</td>\n",
       "      <td>16</td>\n",
       "      <td>0.251603</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>91</th>\n",
       "      <td>13975.9684</td>\n",
       "      <td>25</td>\n",
       "      <td>36</td>\n",
       "      <td>11.416289</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>92 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          TPSA  HBD  HBA       LogP\n",
       "0     291.3849    0    1   2.247301\n",
       "1   17255.4496   25   49   3.952144\n",
       "2   12349.8769   16   36   5.209350\n",
       "3    8262.8100    9   25   6.639898\n",
       "4       0.0000    0    0   8.992202\n",
       "..         ...  ...  ...        ...\n",
       "87   9602.0401   16   25   7.505956\n",
       "88  12349.8769   16   36   5.209350\n",
       "89  27142.5625   36   64   0.417187\n",
       "90   9602.0401   16   16   0.251603\n",
       "91  13975.9684   25   36  11.416289\n",
       "\n",
       "[92 rows x 4 columns]"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 幂运算\n",
    "df_v3 ** 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "100%3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.3.5. 去重复"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'INCHIKEY'"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'Metabolite name'\n",
    "'INCHIKEY'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Metabolite name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>263</th>\n",
       "      <td>C06426;Gamma-Linolenic acid;inhouse</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>262</th>\n",
       "      <td>C02137;Phenylglyoxylic acid;inhouse</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>261</th>\n",
       "      <td>C16971;Hirsuteine;inhouse</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259</th>\n",
       "      <td>C03044;(2R,3R)-Butane-2,3-diol;inhouse</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>257</th>\n",
       "      <td>C16527;Adrenic acid;inhouse</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>C19262;C19262;CFMID</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>C10107;myricetin;inhouse</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>C11143;Dimethyl?sulfoxide;msd</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>C00263;L-Homoserine;inhouse</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>C12244;C12244;CFMID</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>144 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                            Metabolite name\n",
       "263     C06426;Gamma-Linolenic acid;inhouse\n",
       "262     C02137;Phenylglyoxylic acid;inhouse\n",
       "261               C16971;Hirsuteine;inhouse\n",
       "259  C03044;(2R,3R)-Butane-2,3-diol;inhouse\n",
       "257             C16527;Adrenic acid;inhouse\n",
       "..                                      ...\n",
       "7                       C19262;C19262;CFMID\n",
       "6                  C10107;myricetin;inhouse\n",
       "5             C11143;Dimethyl?sulfoxide;msd\n",
       "2               C00263;L-Homoserine;inhouse\n",
       "1                       C12244;C12244;CFMID\n",
       "\n",
       "[144 rows x 1 columns]"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.drop_duplicates(subset='INCHIKEY')[['Metabolite name']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv('data/06_df.csv', index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_pickle('data/06_df.pkl')\n",
    "\n",
    "# 序列化的数据文件"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3.6 数据分布"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Axes: ylabel='Frequency'>"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAGdCAYAAAD0e7I1AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQAAIlNJREFUeJzt3QmUleV9P/AfOAiILC5hKyDUJS64olWUNFU4onIsLifVBE9QqcYtimgMJC5NYwRNSpREIbGKcDQabZUYqRBFq9EgiGtciigoRGVpKKtlv//zvDkzfwbBpDh6733m8znn9c67zOWZxzv3fufZ3ialUqkUAACZalruAgAAfJaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICs1ZS7AJVg06ZN8cEHH0Tr1q2jSZMm5S4OAPAXSOsir1y5Mjp37hxNm267/UbYiSiCTteuXctdDABgOyxYsCC6dOmyzfPCTkTRolNbWW3atCl3cQCAv8CKFSuKxoraz/FtEXYi6rquUtARdgCguvy5ISgGKAMAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyFpNuQtA5ek+fHJUm3dHDSh3EQCoUFp2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICslTXsPP3003HyySdH586do0mTJjFp0qR650ulUlx77bXRqVOnaNmyZfTr1y/mzJlT75qlS5fGoEGDok2bNtGuXbsYMmRIrFq16nP+SQCASlXWsLN69eo4+OCD49Zbb93q+ZtuuinGjBkT48aNixkzZkSrVq2if//+sWbNmrprUtB5/fXX47HHHotHHnmkCFDnn3/+5/hTAACVrKac//iJJ55YbFuTWnVuvvnmuPrqq2PgwIHFsYkTJ0aHDh2KFqAzzzwz3nzzzZgyZUo8//zzcfjhhxfX/OQnP4mTTjopfvSjHxUtRgBA41axY3bmzZsXCxcuLLquarVt2zaOPPLImD59erGfHlPXVW3QSdL1TZs2LVqCtmXt2rWxYsWKehsAkKeKDTsp6CSpJWdzab/2XHps3759vfM1NTWx66671l2zNSNHjiyCU+3WtWvXz+RnAADKr2LDzmdpxIgRsXz58rptwYIF5S4SANDYwk7Hjh2Lx0WLFtU7nvZrz6XHxYsX1zu/YcOGYoZW7TVb07x582L21uYbAJCnig07PXr0KALLtGnT6o6lsTVpLE7v3r2L/fS4bNmyeOGFF+queeKJJ2LTpk3F2B4AgLLOxkrr4bz99tv1BiW//PLLxZibbt26xdChQ+P666+Pvffeuwg/11xzTTHD6pRTTimu32+//eKEE06I8847r5ievn79+rjkkkuKmVpmYgEAZQ87s2bNimOPPbZuf9iwYcXj4MGD46677oqrrrqqWIsnrZuTWnD69OlTTDVv0aJF3ffcc889RcDp27dvMQvr9NNPL9bmAQBImpTSgjaNXOoeS7Oy0mBl43ciug+fHNXm3VEDyl0EACr087tix+wAADQEYQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALJW0WFn48aNcc0110SPHj2iZcuWseeee8b3v//9KJVKddekr6+99tro1KlTcU2/fv1izpw5ZS03AFA5Kjrs3HjjjTF27Nj46U9/Gm+++Waxf9NNN8VPfvKTumvS/pgxY2LcuHExY8aMaNWqVfTv3z/WrFlT1rIDAJWhJirY7373uxg4cGAMGDCg2O/evXvce++9MXPmzLpWnZtvvjmuvvrq4rpk4sSJ0aFDh5g0aVKceeaZZS0/AFB+Fd2yc/TRR8e0adPirbfeKvZfeeWVeOaZZ+LEE08s9ufNmxcLFy4suq5qtW3bNo488siYPn36Np937dq1sWLFinobAJCnim7ZGT58eBFE9t1339hhhx2KMTw/+MEPYtCgQcX5FHSS1JKzubRfe25rRo4cGd/73vc+49IDAJWgolt27r///rjnnnviF7/4Rbz44osxYcKE+NGPflQ8fhojRoyI5cuX120LFixosDIDAJWlolt2vvWtbxWtO7Vjbw488MB47733ipaZwYMHR8eOHYvjixYtKmZj1Ur7hxxyyDaft3nz5sUGAOSvolt2Pvroo2jatH4RU3fWpk2biq/TlPQUeNK4nlqp2yvNyurdu/fnXl4AoPJUdMvOySefXIzR6datWxxwwAHx0ksvxejRo+Pcc88tzjdp0iSGDh0a119/fey9995F+Enr8nTu3DlOOeWUchcfAKgAFR120no6KbxcdNFFsXjx4iLEfOMb3ygWEax11VVXxerVq+P888+PZcuWRZ8+fWLKlCnRokWLspYdAKgMTUqbL0fcSKWurzRlPQ1WbtOmTTR23YdPjmrz7qg/rcUEQOOx4i/8/K7oMTsAAJ+WsAMAZE3YAQCyJuwAAFkTdgCArAk7AEDWhB0AIGvCDgCQNWEHAMiasAMAZE3YAQCyJuwAAFkTdgCArAk7AEDWhB0AIGvCDgCQNWEHAMiasAMAZE3YAQCyJuwAAFkTdgCArAk7AEDWhB0AIGvCDgCQNWEHAMiasAMAZE3YAQCytl1hZ+7cuQ1fEgCASgk7e+21Vxx77LFx9913x5o1axq+VAAA5Qw7L774Yhx00EExbNiw6NixY3zjG9+ImTNnNlSZAADKG3YOOeSQuOWWW+KDDz6IO++8Mz788MPo06dP9OzZM0aPHh1LlixpuBICAJRrgHJNTU2cdtpp8cADD8SNN94Yb7/9dlx55ZXRtWvX+PrXv16EIACAqg07s2bNiosuuig6depUtOikoPPOO+/EY489VrT6DBw4sOFKCgCwHWq255tSsBk/fnzMnj07TjrppJg4cWLx2LTpn7JTjx494q677oru3btvz9MDAJQ37IwdOzbOPffcOPvss4tWna1p37593HHHHZ+2fAAAn3/YmTNnzp+9Zscdd4zBgwdvz9MDAJR3zE7qwkqDkreUjk2YMKEhygUA0CC2K+yMHDkydt999612Xd1www0NUS4AgPKFnfnz5xeDkLe0xx57FOcAAKo67KQWnFdfffVjx1955ZXYbbfdGqJcAADlCztf/epX49JLL40nn3wyNm7cWGxPPPFEXHbZZXHmmWc2TMkAAMo1G+v73/9+vPvuu9G3b99iFeVk06ZNxarJxuwAAFUfdtK08l/+8pdF6EldVy1btowDDzywGLMDAFD1YafWPvvsU2wAAFmFnTRGJ90OYtq0abF48eKiC2tzafwOAEDVhp00EDmFnQEDBkTPnj2jSZMmDV8yAIByhZ377rsv7r///uLmnwAA2U09TwOU99prr4YvDQBAJYSdK664Im655ZYolUoNXR4AgPJ3Yz3zzDPFgoKPPvpoHHDAAdGsWbN65x988MGGKh8AwOcfdtq1axennnrqp/uXAQAqNeyMHz++4UsCAFApY3aSDRs2xOOPPx4/+9nPYuXKlcWxDz74IFatWtWQ5Yv3338/zjrrrOIGo7UrNc+aNavufBo3dO2110anTp2K8/369Ys5c+Y0aBkAgEYWdt57770idAwcODAuvvjiWLJkSXH8xhtvjCuvvLLBCvc///M/ccwxxxRjgtL4oDfeeCP+5V/+JXbZZZe6a2666aYYM2ZMjBs3LmbMmBGtWrWK/v37x5o1axqsHABAI1xU8PDDDy/ui5VaXGqlcTznnXdegxUuhaeuXbvW6zbr0aNHvVadm2++Oa6++uoieCUTJ06MDh06xKRJk9yBHQDYvpad3/72t0XASOvtbK579+5Ft1NDefjhh4tQ9ZWvfCXat28fhx56aNx+++115+fNmxcLFy4suq5qtW3bNo488siYPn36Np937dq1sWLFinobAJCn7Qo76V5Y6f5YW/rDH/4QrVu3joYyd+7cGDt2bOy9994xderUuPDCC+PSSy+NCRMmFOdT0ElSS87m0n7tua0ZOXJkEYpqt9R6BADkabvCzvHHH190H9VK98ZKA5Ovu+66Br2FRApVhx12WNxwww1Fq875559fdJOl8TmfxogRI2L58uV124IFCxqszABABmEnDRJ+9tlnY//99y8GAn/ta1+r68JK42waSpphlf6Nze23334xf/784uuOHTsWj4sWLap3TdqvPbc1zZs3jzZt2tTbAIA8bdcA5S5duhSDk9MNQV999dWiVWfIkCExaNCgYvp3Q0kzsWbPnl3v2FtvvRV77LFH3WDlFGqmTZsWhxxySHEsjb9Js7JSlxcAQM12f2NNTbH+zWfp8ssvj6OPPrroxvqHf/iHmDlzZvz85z8vttrus6FDh8b1119fjOtJ4eeaa66Jzp07xymnnPKZlg0AyDjspOndn+TrX/96NIQjjjgiHnrooWKMzT//8z8XYSaNFUotSLWuuuqqWL16dTGeZ9myZdGnT5+YMmVKtGjRokHKAABUtyal7bh1+eaL+iXr16+Pjz76qJiKvtNOO8XSpUujmqSurzQrKw1WNn4novvwyVFt3h01oNxFAKBCP7+bbu/KxptvacxOGluTWlXuvffeT1NuAIDKuDfWltKYmVGjRhWrKwMAZBd2agctp5uBAgBU9QDldBuHzaVhPx9++GH89Kc/LaaLAwBUddjZclp3mgL+hS98IY477rhiwUEAgKoOO+k2DuQ7swkActKgY3YAALJo2Rk2bNhffO3o0aO3558AAChf2HnppZeKLS0m+MUvfrHunlU77LBDcZfyzcfyAABUXdg5+eSTo3Xr1jFhwoS61ZTT4oLnnHNOfOlLX4orrriiocsJAPD5jdlJM65GjhxZ77YR6et0Q06zsQCAqg876V4US5Ys+djxdGzlypUNUS4AgPKFnVNPPbXosnrwwQfjD3/4Q7H9+7//ewwZMiROO+20hikZAEC5xuyMGzcurrzyyvja175WDFIunqimpgg7P/zhDxuiXAAA5Qs7O+20U9x2221FsHnnnXeKY3vuuWe0atWqYUoFAFAJiwqm+2GlLd3xPAWddI8sAICqDzt//OMfo2/fvrHPPvvESSedVASeJHVjmXYOAFSS7Qo7l19+eTRr1izmz59fdGnVOuOMM2LKlCkNWT4AgM9/zM5vfvObmDp1anTp0qXe8dSd9d577326EgEAlLtlZ/Xq1fVadGotXbo0mjdv3hDlAgAoX9hJt4SYOHFivXtgbdq0KW666aY49thjG6ZkAADl6sZKoSYNUJ41a1asW7currrqqnj99deLlp1nn322IcoFAFC+lp2ePXsWdznv06dPDBw4sOjWSisnpzuhp/V2AACqtmUnrZh8wgknFKsof/e73/1sSgUAUK6WnTTl/NVXX22ofx8AoPK6sc4666y44447Gr40AACVMEB5w4YNceedd8bjjz8evXr1+tg9sUaPHt1Q5QMA+PzCzty5c6N79+7x2muvxWGHHVYcSwOVN5emoQMAVGXYSSskp/tgPfnkk3W3hxgzZkx06NDhsyofAMDnN2Zny7uaP/roo8W0cwCArAYobyv8AABUddhJ43G2HJNjjA4AkM2YndSSc/bZZ9fd7HPNmjVxwQUXfGw21oMPPtiwpQQA+DzCzuDBgz+23g4AQDZhZ/z48Z9dSQAAKm2AMgBApRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALJWVWFn1KhR0aRJkxg6dGjdsTVr1sTFF18cu+22W+y8885x+umnx6JFi8paTgCgclRN2Hn++efjZz/7WRx00EH1jl9++eXx61//Oh544IF46qmn4oMPPojTTjutbOUEACpLVYSdVatWxaBBg+L222+PXXbZpe748uXL44477ojRo0fHcccdF7169Yrx48fH7373u3juuefKWmYAoDJURdhJ3VQDBgyIfv361Tv+wgsvxPr16+sd33fffaNbt24xffr0MpQUAKg0NVHh7rvvvnjxxReLbqwtLVy4MHbcccdo165dveMdOnQozm3L2rVri63WihUrGrjUAEClqOiWnQULFsRll10W99xzT7Ro0aLBnnfkyJHRtm3buq1r164N9twAQGWp6LCTuqkWL14chx12WNTU1BRbGoQ8ZsyY4uvUgrNu3bpYtmxZve9Ls7E6duy4zecdMWJEMd6ndkuhCgDIU0V3Y/Xt2zd+//vf1zt2zjnnFONyvv3tbxctMs2aNYtp06YVU86T2bNnx/z586N3797bfN7mzZsXGwCQv4oOO61bt46ePXvWO9aqVatiTZ3a40OGDIlhw4bFrrvuGm3atIlvfvObRdA56qijylRqAKCSVHTY+Uv8+Mc/jqZNmxYtO2nQcf/+/eO2224rd7EAgArRpFQqlaKRS7Ox0kDlNH4ntQ41pO7DJzfo87F1744aUO4iAFChn98VPUAZAODTEnYAgKwJOwBA1oQdACBrwg4AkDVhBwDImrADAGRN2AEAsibsAABZE3YAgKwJOwBA1oQdACBrwg4AkDVhBwDImrADAGRN2AEAsibsAABZE3YAgKwJOwBA1oQdACBrwg4AkDVhBwDIWk25CwDwWes+fHJUm3dHDSh3ESAbWnYAgKwJOwBA1oQdACBrwg4AkDVhBwDImtlYZMFsGwC2RcsOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNaEHQAga8IOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNYqOuyMHDkyjjjiiGjdunW0b98+TjnllJg9e3a9a9asWRMXX3xx7LbbbrHzzjvH6aefHosWLSpbmQGAylITFeypp54qgkwKPBs2bIjvfOc7cfzxx8cbb7wRrVq1Kq65/PLLY/LkyfHAAw9E27Zt45JLLonTTjstnn322XIXHz5R9+GToxq9O2pAuYsAkE/YmTJlSr39u+66q2jheeGFF+Jv//ZvY/ny5XHHHXfEL37xizjuuOOKa8aPHx/77bdfPPfcc3HUUUeVqeQAQKWo6G6sLaVwk+y6667FYwo969evj379+tVds++++0a3bt1i+vTp23yetWvXxooVK+ptAECeqibsbNq0KYYOHRrHHHNM9OzZszi2cOHC2HHHHaNdu3b1ru3QoUNx7pPGAqUur9qta9eun3n5AYDyqJqwk8buvPbaa3Hfffd96ucaMWJE0UpUuy1YsKBByggAVJ6KHrNTKw06fuSRR+Lpp5+OLl261B3v2LFjrFu3LpYtW1avdSfNxkrntqV58+bFBgDkr6JbdkqlUhF0HnrooXjiiSeiR48e9c736tUrmjVrFtOmTas7lqamz58/P3r37l2GEgMAlaam0ruu0kyrX/3qV8VaO7XjcNI4m5YtWxaPQ4YMiWHDhhWDltu0aRPf/OY3i6BjJhYAUPFhZ+zYscXj3/3d39U7nqaXn3322cXXP/7xj6Np06bFYoJpllX//v3jtttuK0t5AYDKU1Pp3Vh/TosWLeLWW28tNgCAqhqzAwDwaQk7AEDWhB0AIGvCDgCQNWEHAMiasAMAZE3YAQCyJuwAAFkTdgCArAk7AEDWhB0AIGvCDgCQNWEHAMiasAMAZE3YAQCyVlPuAgDVpfvwyeUuAsD/iZYdACBrwg4AkDVhBwDImrADAGRN2AEAsibsAABZE3YAgKwJOwBA1oQdACBrwg4AkDVhBwDImrADAGRN2AEAsibsAABZE3YAgKzVlLsAAMBfrvvwyVFt3h01oKz/vpYdACBrwg4AkDVhBwDImrADAGRN2AEAsibsAABZE3YAgKwJOwBA1oQdACBrVlAGqEBWyf18VGM983+nZQcAyJqwAwBkTTcWAA1ClxCVSssOAJA1YQcAyJqwAwBkTdgBALIm7AAAWRN2AICsCTsAQNayCTu33nprdO/ePVq0aBFHHnlkzJw5s9xFAgAqQBZh55e//GUMGzYsrrvuunjxxRfj4IMPjv79+8fixYvLXTQAoMyyCDujR4+O8847L84555zYf//9Y9y4cbHTTjvFnXfeWe6iAQBlVvW3i1i3bl288MILMWLEiLpjTZs2jX79+sX06dO3+j1r164ttlrLly8vHlesWNHg5du09qMGf04AqCYrPoPP182ft1Qq5R12/vu//zs2btwYHTp0qHc87f/Xf/3XVr9n5MiR8b3vfe9jx7t27fqZlRMAGqu2N3+2z79y5cpo27ZtvmFne6RWoDTGp9amTZti6dKlsdtuu0WTJk2iUqUEmwLZggULok2bNuUuTkVRN9umbrZN3Wydetk2dVNZdZNadFLQ6dy58ydeV/VhZ/fdd48ddtghFi1aVO942u/YseNWv6d58+bFtrl27dpFtUgvIr9kW6dutk3dbJu62Tr1sm3qpnLq5pNadLIZoLzjjjtGr169Ytq0afVaatJ+7969y1o2AKD8qr5lJ0ldUoMHD47DDz88/uZv/iZuvvnmWL16dTE7CwBo3LIIO2eccUYsWbIkrr322li4cGEccsghMWXKlI8NWq52qestrSW0ZRcc6uaTqJttUzdbp162Td1UZ900Kf25+VoAAFWs6sfsAAB8EmEHAMiasAMAZE3YAQCyJuyU2dNPPx0nn3xysfpjWr150qRJ9c6n8eNpllmnTp2iZcuWxT2/5syZU++atPrzoEGDikWc0uKIQ4YMiVWrVkW1S7f1OOKII6J169bRvn37OOWUU2L27Nn1rlmzZk1cfPHFxerXO++8c5x++ukfW2By/vz5MWDAgOLmsOl5vvWtb8WGDRuimo0dOzYOOuigusW70ppSjz76aDT2etnSqFGjit+roUOHRmOvm3/6p38q6mLzbd99943GXi+13n///TjrrLOKnz+91x544IExa9asaOzvxd27d//Y6yZt6bVSVa+bNBuL8vmP//iP0ne/+93Sgw8+mGbFlR566KF650eNGlVq27ZtadKkSaVXXnml9Pd///elHj16lP73f/+37poTTjihdPDBB5eee+650m9/+9vSXnvtVfrqV79aqnb9+/cvjR8/vvTaa6+VXn755dJJJ51U6tatW2nVqlV111xwwQWlrl27lqZNm1aaNWtW6aijjiodffTRdec3bNhQ6tmzZ6lfv36ll156qajv3XffvTRixIhSNXv44YdLkydPLr311lul2bNnl77zne+UmjVrVtRVY66Xzc2cObPUvXv30kEHHVS67LLL6o431rq57rrrSgcccEDpww8/rNuWLFlSauz1kixdurS0xx57lM4+++zSjBkzSnPnzi1NnTq19Pbbb5ca+3vx4sWL671mHnvsseKz6sknn6yq142wU0G2DDubNm0qdezYsfTDH/6w7tiyZctKzZs3L917773F/htvvFF83/PPP193zaOPPlpq0qRJ6f333y/lJP3SpZ/1qaeeqquL9AH/wAMP1F3z5ptvFtdMnz692E+/WE2bNi0tXLiw7pqxY8eW2rRpU1q7dm0pJ7vsskvpX//1X9VLqVRauXJlae+99y7emL/85S/XhZ3GXDcp7KQP4q1pzPWSfPvb3y716dNnm+e9F/9/6Xdpzz33LOqkml43urEq2Lx584pFElNz6eb3ADnyyCNj+vTpxX56TM2lafXoWun6pk2bxowZMyIny5cvLx533XXX4vGFF16I9evX16uf1CzfrVu3evWTmqM3X2Cyf//+xQ3rXn/99cjBxo0b47777itWDU/dWeolimb11Gy+eR0kjb1uUrdL6jL/67/+66K7JXUvJI29Xh5++OHiPfQrX/lK0c1y6KGHxu2331533nvxn6xbty7uvvvuOPfcc4uurGp63Qg7FSz9ciVbrgSd9mvPpcf0y7m5mpqaIhDUXpODdL+zNO7imGOOiZ49exbH0s+X7o225U1ct6yfrdVf7blq9vvf/77oI0+rlV5wwQXx0EMPxf7779/o6yUFvxdffLEY87Wlxlw36YP5rrvuKlaXT2O+0gf4l770peKO0Y25XpK5c+cWdbL33nvH1KlT48ILL4xLL700JkyYUJz3XvwnaUzpsmXL4uyzzy72q+l1k8XtIshf+kv9tddei2eeeabcRakYX/ziF+Pll18uWrz+7d/+rbg/3FNPPRWN2YIFC+Kyyy6Lxx57LFq0aFHu4lSUE088se7rNLg9hZ899tgj7r///mLAbWOW/phKLTI33HBDsZ9adtL7zbhx44rfK/7kjjvuKF5HqXWw2mjZqWAdO3YsHrcc2Z72a8+lx8WLF9c7n0a5p1kBtddUu0suuSQeeeSRePLJJ6NLly51x9PPl5pV018an1Q/W6u/2nPVLP1Ftddee0WvXr2KVoyDDz44brnllkZdL6lZPf0+HHbYYcVf1WlLAXDMmDHF1+kvysZaN1tKf43vs88+8fbbbzfq10ySZlilVtHN7bfffnXdfN6LI9577714/PHH4x//8R/rjlXT60bYqWA9evQoXgzTpk2rO5b6OVP/bxqbkaTH9EJLb/K1nnjiieIvlfSXWzVLY7ZT0EndM+lnSvWxufQh36xZs3r1k6ampzeozesndfds/iaU/upPU0O3fHOrdun/+dq1axt1vfTt27f4uVKLV+2W/mJP41Nqv26sdbOlNCX6nXfeKT7oG/NrJknd41sua/HWW28VLV9JY38vTsaPH19006WxcLWq6nXzuQ2FZpuzRtJ0vLSl/x2jR48uvn7vvffqpju2a9eu9Ktf/ar06quvlgYOHLjV6Y6HHnpoMWXymWeeKWahVPt0x+TCCy8spnr+53/+Z72pjx999FHdNWnaY5qO/sQTTxTTHnv37l1sW057PP7444vp61OmTCl94QtfqPrpssOHDy9mpc2bN694XaT9NOvjN7/5TaOul63ZfDZWY66bK664ovhdSq+ZZ599tpgKnKYAp1mOjbleapcpqKmpKf3gBz8ozZkzp3TPPfeUdtppp9Ldd99dd01jfi/euHFj8dpIs9a2VC2vG2GnzNJaBSnkbLkNHjy4OJ+m911zzTWlDh06FNMc+/btW6yrsrk//vGPxS/UzjvvXEznO+ecc4oQVe22Vi9pS2vv1EpvNBdddFEx7Tq9OZ166qlFINrcu+++WzrxxBNLLVu2LN7c05v++vXrS9Xs3HPPLdYF2XHHHYs3jvS6qA06jble/pKw01jr5owzzih16tSpeM381V/9VbG/+ToyjbVeav36178uPpTT++y+++5b+vnPf17vfGN+L546dWrx3rvlz1tNr5sm6T+fXzsSAMDny5gdACBrwg4AkDVhBwDImrADAGRN2AEAsibsAABZE3YAgKwJOwBA1oQdACBrwg4AkDVhBwDImrADAETO/h9+i8HDnCDy/gAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df['Average Mz'].plot.hist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 作业\n",
    "- 读取LC-MS.csv文件\n",
    "- 计算每行的峰面积（G2_1、G2_2、G2_3）的均值\n",
    "- 显示均值的数据分布\n",
    "- 将脚本保存为html文件\n",
    "- html文件发送至邮箱6256691@qq.com,邮件主题：python智能化学作业01\n",
    "- 3月24日之前提交"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
